Peter S. H. Leeflang Dick R. Wittink Mi PDF
Peter S. H. Leeflang Dick R. Wittink Mi PDF
Peter S. H. Leeflang Dick R. Wittink Mi PDF
BUILDING MODELS
FOR MARKETING
DECISIONS
by
Dick R. Wittink
Yale School of Management, U.S.A. and
University ofGroningen, The Netherlands
Michel Wedel
University ofGroningen, The Netherlands
and
Philippe A. Naert
Tilburg University, The Netherlands
Hanneke
Marian
Rennie
Magda
Contents
Preface . . . . . . . . . . . . . . . Xlll
Introduction . 3
1.1 Purpose 4
1.2 Outline 7
1.3 The model concept 10
6 Marketing dynamics . . . . . . . . . . . . . . . 85
6.1 Modeling lagged effects: one explanatory variable . . 85
6.2 Modeling lagged effects: several explanatory variables 96
6.3 Selection of(dynamic) models 97
6.4 Lead effects . . . . . . 98
This book is about marketing models and the process of model building. Our primary
focus is on models that can be used by managers to support marketing decisions. It
has long been known that simple models usually outperform judgments in predicting
outcomes in a wide variety of contexts. For example, models of judgments tend to
provide better forecasts of the outcomes than the judgments themselves (because the
model eliminates the noise in judgments). And since judgments never fully reflect
the complexities of the many forces that influence outcomes, it is easy to see why
models of actual outcomes should be very attractive to (marketing) decision makers.
Thus, appropriately constructed models can provide insights about structural relations
between marketing variables. Since models explicate the relations, both the process
of model building and the model that ultimately results can improve the quality of
marketing decisions.
Managers often use rules of thumb for decisions. For example, a brand manager will
have defined a specific set of alternative brands as the competitive set within a product
category. Usually this set is based on perceived similarities in brand characteristics,
advertising messages, etc. If a new marketing initiative occurs for one of the other
brands, the brand manager will have a strong inclination to react. The reaction is
partly based on the manager's desire to maintain some competitive parity in the mar-
keting variables. An economic perspective, however, would suggest that the need for
a reaction depends on the impact of the marketing activity for the other brand on the
demand for the manager's brand. The models we present and discuss in this book are
designed to provide managers with such information.
Compared with only a few decades ago, marketing models have become important
tools for managers in many industries. Prior to the introduction of scanner equipment
in retail outlets, brand managers depended upon ACNielsen for bimonthly audit data
about sales, market shares, prices, etc. for the brands in a given product category.
Those data were rarely good enough for the estimation of demand functions. Indeed,
Art Nielsen used to say that Nielsen was in the business of reporting the score and
was not in the business of explaining or predicting the score. With technological
advances (e.g. the availability of scanner data, improved hardware and software), the
opportunity to obtain meaningful estimates of demand functions vastly improved.
xiii
XIV PREFACE
Managers will particularly benefit from models of marketing phenomena if they un-
derstand what the models do and do not capture. With this understanding they can,
for example, augment model-based conclusions with their own expertise about com-
plexities that fall outside the modelers' purview. Importantly, the systematic analysis
of purchase (and other) data can provide competitive advantages to managers. Model
benefits include cost savings resulting from improvements in resource allocations as
we discuss in various applications. And the leaders or first movers in the modeling of
marketing phenomena can pursue strategies not available nor transparent to managers
lagging in the use of data.
The book is suitable for student use in courses such as "Models in Marketing",
"Marketing Science" and "Quantitative Analysis in Marketing" at the graduate and
advanced undergraduate level. The material can be supplemented by articles from
journals such as the Journal ofMarketing Research, Marketing Science, Management
Science and The International Journal ofResearch in Marketing.
It consists of four parts. Part I provides an introduction to marketing models. It
covers the first four chapters and deals with the definition of a model, the benefits to
be derived from model building, and a typology of marketing models.
In part II, which covers 10 chapters, we give guidelines for model specification
and we discuss examples of models that were developed in the past thirty years.
We start with some elementary notions of model specification. Then we discuss the
modeling of marketing dynamics and implementation criteria with respect to model
structure. This is followed by a presentation of descriptive, predictive and normative
models, models to diagnose competition, etc. We discuss many specification issues
such as aggregation, pooling, asymmetry in competition and the modeling of inter-
dependencies between products. The primary focus is on models parameterized with
aggregate data. In Chapter 12 we introduce models that describe individual (choice)
behavior.
Part III (4 chapters) covers topics such as data collection, parameterization and
validation. We present estimation methods for both objective and subjective data.
Aspects of implementation are the topics of Part IV, which covers (three) chapters
on determinants of model implementation, cost-benefit considerations and the future
of marketing models.
Chapters 2-10 in this book correspond with chapters 2-9 ofNaert and Leeflang ( 1978).
These chapters have been updated and rewritten. Chapters 11-18 are new and/or are
completely rewritten. Chapters 19 and 20 are partly based on Chapters 13 and 14
from Naert, Leeflang. The discussion about the future of marketing models (Chapter
21) is new.
Several colleagues have contributed with their comments on various drafts. We thank
Albert van Dijk, Harald van Heerde, Martijn Juurlink, Marcel Komelis and Tom
Wansbeek (all from the Department of Economics, University of Groningen) for
their comments. Harald's input deserves special attention because he is the author of
XVI PREFACE
Section 16.7. We are also indebted to our students whose suggestions improveQ the
readability of the text. Four of them should be mentioned explicitly: Hans Mommer
and Martijn van Geelen, who gave important inputs to Section 16.9, Martijn Juurlink
and Brian Lokhorst. We are much indebted to Albert van Dijk, Martijn Juurlink and
Suwami Bambang Oetomo for deciphering many barely readable scribbles and for
creating a carefully typed form. We would like to thank Siep Kroonenberg for her
technical assistance with LaTeX. Albert also showed that he is very adept at managing
the manuscript through the various stages until its completion.
October, 1999
PART ONE
Introduction to marketing models
CHAPTER 1
Introduction
Model building in marketing started in the fifties. It is now a well-developed area with
important contributions by academics and practitioners. Models have been developed
to advance marketing knowledge and to aid management decision making. Several
state-of-the-art textbooks review and discuss many of the models. 1 This book is a
revised version of Building Jmplementable Marketing Models (BIMM). 2 BIMM was
not positioned as a "state-of-the-art book". Instead, it elaborated on the following
steps in the model-building process: specification, estimation, validation, and use of
the model. In the current book we maintain the focus on these steps, with the aim to
contribute to the increased implementation of marketing models.
In the past two decades, the use of information in marketing decision making
has undergone revolutionary change. 3 There are important developments in the avail-
ability of new data sources, new tools, new methods and new models. These devel-
opments provide rich opportunities for researchers to provide new insights into the
effects of marketing variables, and at the same time force managers to reconsider
modus operandi.
Closely related to the process and the activity of model building in marketing is the
field of marketing science. One interpretation of marketing science is that it represents
the scientific approach to the study of marketing phenomena. In its broadest sense this
perspective should include the many disciplines relevant to marketing. However, the
term marketing science has been appropriated in the early 1980's by researchers who
favor quantitative and analytical approaches.
We describe the purpose of the current book in more detail in Section 1.1. In Section
1.2 we present the outline, and in Section 1.3 we define and discuss the concept of a
model.
I. Some examples are King (1967), Montgomery, Urban (1969), Simon, Freimer (1970), Kotler (1971), Leef-
lang (1974), Fitzroy (1976), Parsons, Schultz (1976), Lilien, Kotler (1983), Schultz, Zoltners (1981), Hanssens,
Parsons, Schultz ( 1990), Lilien, Kotler, Moorthy ( 1992) and Eliashberg, Lilien ( 1993).
2. Naert, Leeflang ( 1978).
3. Bucklin, Lehmann, Little ( 1998).
4 CHAPTER I
1.1 Purpose
Operations research (OR) and management science (MS) largely emerged during and
after World War II with algorithms and processes applicable to production and logis-
tics. Successes in those areas encouraged researchers to solve problems in other areas
such as finance and marketing in the early 1960's. Initially, the emphasis was on the
application of existing OR/MS methods for the solution of marketing problems. And
econometric methods were introduced into marketing to estimate demand relation-
ships. These econometric applications were hampered by the limited availability of
relevant data. Thus, the early publications of applied econometrics in marketing have
to be seen more as treatises on the possible use of methods than as papers providing
useful substantive results. Now, at the end of this millennium it is predicted that in
the coming decades, a growing proportion of marketing decisions will be automated
by ever-more-powerful combinations of data, models, and computers and
"... the age of marketing decision support ... will usher in the era of marketing
decision automation" ---
(Bucklin et al., 1998, p. 235).
We distinguish five eras of model building in marketing. 4
1. The first era is defined primarily by the transposition of OR/MS methods into a
marketing framework. These OR/MS tools include mathematical programming, com-
puter simulations, stochastic models (of consumer choice behavior), game theory, and
dynamic modeling. Difference and differential equations, usually of the first order,
were used to model dynamics. 5 The methods were typically not realistic, and the use
of these methods in marketing applications was therefore very limited.
2. The second era can be characterized by models adapted to fit the marketing prob-
lems, because it was felt that lack of realism was the principal reason for the lack
of implementation of the early marketing models. These more complex models were
more representative of reality, but they lacked simplicity and usability. In this era,
which ended in the late sixties/early seventies, descriptive models of marketing deci-
sion making and early econometric models attracted research attention. In a critical
evaluation of the literature, Leeflang (1974) argued that many of those models failed
to represent reality. For example, the models typically did not consider the effects
of competition, were static and only considered one marketing instrument. Specific
models were created to deal with specific marketing problems such as models for
product decisions, for pricing decisions, for sales force management decisions, etc.
See Montgomery and Urban (1969) and Kotler (1971) for illustrative state-of the-art
books in this respect.
3. The third era starts in the early 1970's. There is increased emphasis on models that
are good representations of reality, and at the same time are easy to use. Thus the focus
4. The definitions of the first three eras are due to Montgomery (1973).
5. See also Eliashberg. Lilien (1993).
INTRODUCTION 5
In this era we also see the introduction of models parameterized with subjective (and
objective) data. The argument for the use of subjective data is the following. If there
is a lack of data or a lack of data with sufficient quality, a model that captures a
decision maker's judgment about outcomes under a variety of conditions can provide
the basis for superior decisions in future time periods relative to the decision maker's
judgments made in those future time periods. The arguments include consistency
in model-based judgments and the systematic consideration of variation in multiple
variables. 6 Other developments include strategic marketing (planning) models and
Marketing Decision Support Systems (MDSS). In addition, some research focuses on
the relations between marketing models and organizational design, and between mar-
keting decisions and issues in production and logistics. Importantly, due to Little, this
is an era in which researchers attempt to specify implementable marketing models.
4. The fourth era, starting in the mid 1980's, is an era in which many models are
actually implemented. An important force that drives implementation is the increased
availability of marketing data. The data include standard survey and panel data, of-
ten collected by computer-aided personal and telephone interviewing, and customer
transaction databases. 7 Of particular relevance to the focus of this book is the in-
creasing use by retailers of scanning equipment to capture store- and household-level
purchases. 8 This "scanning revolution", combined with developments in the avail-
ability of other data services, also stimulates the application of new methods. Thus
although marketing problems were initially forced into existing ORIMS frameworks
and methods, a subfield in marketing now exists in which quantitative approaches
are developed and adapted. 9 The modeling developments focus on questions in prod-
uct design, on models of consumers' choice processes, on the optimal selection of
addresses for direct mail, on models that account for consumer behavior hetero-
geneity, on marketing channel operations, on optimal competitive strategies and on
the estimation of competitive reactions. These developments result in models that
increasingly:
• satisfy Little's implementation criteria;
• are parameterized based on a large number of observations;
• account for errors in the data, etc.
The models also tend to be complete on important issues and more frequently con-
sider, for example, all relevant marketing instruments and the effects of competi-
tion.10 Of course, the increased availability of data also offers opportunities for re-
searchers to build models that can advance our marketing knowledge and that can
produce generalizable phenomena.
The "fourth era" 11 is also characterized by a latent demand for models by firms. In
earlier eras the model builder - model user interface was probably dominated by the
supply side. Thus, analysts still had to sell their models and convince the user of
potential benefits.
This era can also be characterized as an age of marketing decision support. 12 In the
1980's (and 1990's) there is a virtual explosion of computerized marketing manage-
ment support systems ranging from "information systems" to "marketing creativity-
enhancement programs" . 13
5. The fifth era, starting in the 1990's, may be characterized by an increase in rou-
tinized model applications. Bucklin et al. ( 1998) predict that in the coming decades,
the age of marketing decision support will usher in an era of marketing decision
automation, partly due to the "wealth" of data that can inform decision making in
marketing. We also expect that firms increasingly customize their marketing activities
to individual stores, customers and transactions. 14 If models and decision support sys-
tems take care of routine marketing decisions, then the manager has more opportunity
to focus on the non-routine decisions which may require creative thinking. Examples
of marketing decisions appropriate for automation are: 15
• assortment decisions and shelf space allocation decisions for individual stores;
• customized product offerings including individualized pricing and promotion;
• targeting of direct mail solicitations;
• coupon targeting;
• loyalty reward and frequent shopper club programs;
• the creation of promotion calendars, etc.
In Chapter 21 we consider likely future developments in more detail.
In this book we concentrate on the steps in the model-building process that is es-
pecially representative of the fourth era. Although we pay attention to models that
advance marketing knowledge, our emphasis is on marketing decision models. In the
next section we discuss the outline of this book.
1.2 Outline
Part I deals with the definition of a model, the degree of explicitness in modeling
a decision problem, the benefits to be derived from model building, and a typology
of marketing models. In Chapter l, we define the model concept. In Chapter 2, we
classify models according to the degree of explicitness, and we distinguish implicit,
verbal, formalized, and numerically specified models. One question a potential model
user will ask is how a model can help the user. We discuss model benefits in Chapter
3. In this chapter we also discuss models that advance marketing knowledge. Two
important determinants of model specification are the intended use of the model and
the desired level of behavioral detail. We provide a typology of marketing models
based on these two dimensions in Chapter 4. Chapter 4 is the basis for Chapters 8-10
of Part II.
In Chapter 5 we review the main steps of the model-building process, the components
of a mathematical model, and some elementary notions of model specification. The
sequence of the "traditional" steps, Specification, Parameterization, Validation and
Use/Implementation constitutes the structure of what follows.
Marketing is in essence dynamic. In Chapter 6 we discuss the modeling of mar-
keting dynamics. In Chapter 7 we discuss implementation criteria with respect to
model structure.
In Chapter 8 we give examples of models specified according to intended use,
i.e. according to whether the primary intention is to have a descriptive, predictive, or
normative model. One class of predictive models consists of the demand models. We
study these models in some detail in Chapter 9. In Chapter I 0 we distinguish models
according to the amount of behavioral detail (little, some, much).
In Chapters 11-14 we discuss model specifications. In Chapter 11 we introduce
models to diagnose competition and game-theoretic approaches, and focus our atten-
tion on models parameterized with aggregate (macro) data. In Chapter 12 we intro-
duce models that describe individual choice behavior. We also discuss the modeling
of competitive structures based on individual/household (micro) data.
Most firms offer multiple products in the marketplace. In Chapter 13 we dis-
cuss the different forms of interdependence between products and illustrate modeling
approaches with explicit treatment of some forms of interdependence.
We end Part II with a discussion of specification issues such as aggregation, pool-
ing, the definition of the relevant market, asymmetry in competition, and we introduce
hierarchical models (Chapter 14).
Collecting data is the first step of parameterization. The organization of useful data is
the central theme of Chapter 15. We also discuss decision-support systems and data
sources in this chapter.
In Chapter 16 we consider methods and procedures for parameter estimation,
including pooling methods, generalized least squares, simultaneous equation systems,
nonparametric and semiparametric estimation, and maximum likelihood estimation.
Both estimation from historical data and subjective estimation are discussed. Given
that statistical inferences are inherent in the use of results, we also discuss statistical
INTRODUCTION 9
Aspects of implementation are given particular attention in Part IV. In Chapter 19, we
discuss organizational aspects of model building, implementation strategy, the rela-
tion between model builder and user, and ease of use. Chapter 20 contains a number
of cost-benefit considerations relevant to model use. We end with a discussion of the
future of models for marketing decisions in Chapter 21. In this final chapter, we also
consider the nature of modeling in the fifth era of marketing models.
Much of the modeling of marketing phenomena does not follow this sequence, how-
ever, and is ad hoc. From a theoretical perspective it is desirable to start with prim-
itives and to derive a model with testable implications. In practice, the demand for
unconditional and conditional forecasting tools often negates the development of
structurally complete model specifications.
The material we present and discuss reflects of course our own expertise, experience
and interests. Thus, many of the applications come from consumer goods manu-
facturers, which is also consistent with existing papers in the marketing journals.
Increasingly, however, we also see useful applications in industrial marketing, 16 inter-
national marketing, 17 retail marketing, 18 and the marketing of services. 19 In addition,
Increasingly, household purchase and preference data are used to learn about real-
world market behavior. There are important benefits that result from a focus on disag-
gregate data. One is that at the disaggregate level model specifications tend to be more
consistent with plausible conditions of utility maximizing consumers. Another is that
heterogeneity in consumer behavior can be observed and fully exploited. A logical
consequence of this orientation is that models of aggregate data will be respecified
to be consistent with disaggregate models. 20 At the same time, there is an emerging
interest among researchers and practitioners to determine the relative advantages and
disadvantages of, for example, household- and store-level purchase data. 21
We now define what we mean by a model and propose the following definition:
A model is a representation of the most important elements of a perceived real-
world system.
This definition indicates that models are condensed representations or simplified pic-
tures of reality. We can think of models as structures that are necessary, or at least
useful, for decision makers to understand the systematic parts of an underlying reality.
To provide a more detailed perspective, we next discuss the components of this defi-
nition.
1. Real-world system. We can take the perspective of a manager for whom the
system captures the relevant parts of the environment in which the manager makes
decisions. The system consists of all elements that have or can have a bearing on
the problem under study. The real-world system relevant to this book is the part that
deals with the marketing environment (including "non-marketing elements" that have
a bearing on marketing decisions). In a narrow sense, we can distinguish marketing
models from production models, finance models and corporate models. However,
models as representations of the important elements of a system are not restricted
to the management sciences but can be found in all sciences. Thus, we can also
distinguish physical, psychological, sociological and economic models. 22
on the user, the intemctions between model builder and user, and the intended use. 23
Importantly, by describing the subjective interpretation of the real-world system, the
model builder explicates the perceptions. Thus, others can debate whether important
elements are omitted, whether the relationships expressed are inadequate, etc.
3. Most important elements. The purpose of a model is often for the user to obtain a
better understanding of the real world it represents. Perhaps especially in marketing,
because it deals with human behavior, the relevant part of the real world is potentially
very complex. For example, consumer purchase decisions are influenced by economic
as well as psychological and sociological motivations. Marketing managers use a
diversity of appeals directed at many different segments of consumers. Thus, the
demand functions which represent the centml part of model building in marketing
can potentially contain an almost infinite number of factors.
Models are intended to capture the most important elements out of this complexity.
One reason is that it is truly impossible to provide a complete representation of the
part of the real world relevant to a marketing manager. Another reason is that man-
agers can obtain a better understanding of the real world by focusing only on the most
critical elements. This idea is also reflected in other definitions of models such as the
descriptions of models as simplified pictures24 or stylized representations. 25
4. Representation. We can represent the system being studied not only by choos-
ing the most important elements, but also through the form in which the system is
expressed. Perhaps the simplest form of representation is the verbal one. 27 Another
simple form is an analog one which is used to show physical measures such as speed
23. The idea that no wtique representation of a system exists has been referred to by Lilien (1975, p.l2) as
"model relativism".
24. Leefiang ( 1974, p.8).
25. Lilien, Rangaswamy (1998, p.6).
26. We return to this issue in Section 7.2.1.
27. We can also use implicit models but these tend to suffer from a lack of transparency.
12 CHAPTER I
The first step in making a model explicit is for a manager to state in words what he
perceives as the important elements surrounding a problem. This is more easily said
than done. There are many examples in the (marketing) literature which illustrate the
13
14 CHAPTER2
I. A descriptive model, as tlte name indicates, describes a situation or a decision-making process. The notion
of a descriptive model will be discussed in Sections 4.1 and 8.1.
2. The ultimate model is a logical flow model, but tltis was constructed on tlte basis of verbal descriptions of
tlte elements oftlte model and tlteir interactions. The notion of a logical flow model is defined in Section 2.3.
A recent example can be found in Brand ( 1993). Brand describes tlte complex organizational buying process of
heat exchangers witlt a so-called buying process-simulation model; see Section 8.1.
3. A limit price bas tlte property tltat prices above its value will stimulate entry by competitors, whereas lower
prices will discourage entry.
CLASSIFYING MARKETING MODELS 15
"I will change my price in steps with each change equal to plus or minus D..p,
until the increase in profit is less than a predetermined amount 8, with the
restriction that price stays below the value Pc·"
In most marketing problems, there is a variety of variables which can play an impor-
tant role. These variables may have complex effects on the demand. For a description
of the relationships in words it may be difficult or even impossible to keep all relevant
characteristics and conditions in mind. To illustrate the idea, we complicate the ex-
ample given in Section 2.2 by adding advertising as a second marketing instrument.
Increases in advertising expenditures may lead to increased sales, but total costs will
increase as well. The complexity of sales effects is indicated by the idea that adverti-
sing expenditures in period t (say April, 1999) may not only lead to increases in sales
in period t but also to increases in period t + 1 (say May, 1999), and possibly may
contribute to increases in sales in the periods after t + 1. Increases in sales in April,
1999 will result in changes in the total contribution(= per unit contribution times
number of units sold) in April, 1999 and this may lead to changes in advertising
expenditures in future periods. 5 Thus relations may exist between:
• advertising expenditures in t and unit sales in t;
• advertising expenditures in t and sales in t + 1;
• advertising expenditures in t and sales in t + 2 (and perhaps later periods);
• sales in t and advertising expenditures in t + 1; etc.
In order to make relationships more precise it is necessary to fonnalize them. This
means that we specify which variables influence which other variables and what the
directions of causality between these variables are. The representation of a system
through fonnalized relationships between the most important variables of a system is
called a fonnalized model. Within the class of fonnalized models we make a further
distinction between logical flow models and fonnalized mathematical models.
4. See for other examples: Montgomery and Urban (1969, pp. 9-12), Boyd and Massy (1972, pp. 17-21), Lilien,
Kotler, Moorthy ( !992, pp. 1-7).
5. This is based on the fact that many firms determine advertising expenditures on the basis of past revenue
performance. While this runs contrary to the general belief that advertising is a determinant of sales and not the
other way around, fixing the level of the advertising budget as a percentage of past sales nevertheless remains
a common corporate practice. See Schmalensee (I 972, Chapter 2), Cooil, Devinney (I 992) and Batra, Myers,
Aaker (!996, p. 551).
16 CHAPTER2
Current price
No
A logical flow model represents an extension of the verbal model by the use of a
diagram. This diagram shows the sequence of questions and of actions leading to a
solution of the problem. This kind of model is also known as a graphical or a con-
ceptual model. The flow diagram makes clear, or more explicit, what the manager has
put in words. Such a diagram can serve as a basis for discussions. The diagram may
show discrepancies between the formalized model and the decision maker's thinking.
It can also be used to identify possible inconsistencies in the model.
A formalized mathematical model represents a part of the real-world system by
specifying relations between some explanatory (predictor) variables and some effect
(criterion) variable(s).
We now return to the example of the monopolist described in Section 2.2. Our
decision maker wants to try a price increase first, and he wants to change price in in-
crements equal to t:.p. The logical flow model representing the monopolist's problem
CLASSIFYING MARKETING MODELS 17
from a managerial decision making point of view because nothing is said about how
demand (q) depends on price. In the logical flow model this relation is approached by
trial-and-error. If the manager wants to have a clear description of how decisions are
currently made, a logical flow model may be sufficient. In the following section and
in Chapter 5, we shall devote extensive attention to the specification of the demand
relations.
In numerically specified models, the various components and their interrelations are
quantified. The decision maker's objective is now to disentangle the kind of relation-
ships described in Section 2.3. A set of mathematical relations written "on the back of
an envelope" may do the job, which suggests that the model purpose has implications
as to the desired "degree of explicitness". For this reason we classify models along
that dimension.
Even if optimization is the ultimate reason for modeling a particular problem,
having a numerically specified model is not always a prerequisite. For example, the
logical flow model in Section 2.3 may, under certain conditions, allow the decision
maker to find a near-optimal price, without having to specify the demand function
numerically. Nevertheless, numerically specified models are, in many situations, the
most appropriate representations of real-world systems. Below we examine why a
numerically specified model might be more appropriate than a flow model.
First of all, a numerically specified model will allow the decision maker to quan-
tify the effects of multiple, and potentially conflicting forces. Consider, for example,
the monopolist decision maker who realizes that "there is a trade-offbetween changes
in sales and changes in (unit) contribution". Specifying a model numerically will
provide precision to the statements that a price increase results in a sales decrease
and an advertising increase results in a sales increase.
Secondly, we may say that if a numerically specified model constitutes a reason-
able representation of a real-world system, it can be used to examine the consequences
of alternative courses of action and market events. We can imagine that a properly
numerically specified model provides the decision maker with an opportunity to con-
duct experiments. Once various relationships are quantified, the decision maker can
contemplate how the demand varies with price and other changes. It should be clear
that such experiments are inexpensive, and less risky than the market experiments
conducted by the monopolist (in Sections 2.2 and 2.3). If a decision maker changes a
product's actual price frequently in order to determine the near-optimal price, many
consumers may be turned away, with potentially long-term consequences.
Thus, a numerically specified model gives management the opportunity to explore
the consequences of a myriad of actions, a capability which cannot normally be du-
plicated in the real world. These considerations lead to the use of simulation models,
both in the sense of providing answers to "what if' types of questions and in the
sense of dealing with stochastic (uncertain) elements. Of course, the representation of
real-world systems by numerically specified models provides advantages, conditional
CLASSIFYING MARKETING MODELS 19
(2.3a)
upon the model being a reasonable representation of reality. How to construct and
evaluate such representations is an important objective of this book.
Having presented some advantages, we should also consider disadvantages. Build-
ing and using models costs money, and the more complicated and the more explicit
models become, the more expensive they will be. Thus weighing the costs against the
benefits will always be a necessary step in the modeling process.
We conclude this chapter by showing in Figure 2.4 a numerical specification for the
example of the preceding sections. The symbols are the same as in Figure 2.3. In
Figure 2.4 we assume that the fixed costs (FC) are equal to $100. The difference
between Figure 2.3 and Figure 2.4 is the numerically specified relation6 between q
and p (relation 2.3a). This relation is of the form:
(2.4)
In order to illustrate how the optimal value for the price variable can be obtained,
we assume that the optimal price is smaller than Pc· Once the parameters have been
determined (as in (2.3a)), the optimal price can be obtained by differentiating the
profit function with respect to price, setting it equal to zero, and solving for price.
This is shown below. We start by substituting (2.3a) in (2.1 a):
p =2c (2.7)
which implies that the monopolist should use a markup of 100 percent. To make sure
=
that p 2c corresponds to a maximum, second-order conditions should be examined.
For reasonable specifications of demand and profit functions, these will generally be
satisfied. The expression for the second-order condition in our example is:
d 2n
- = +20p- 3 - 60cp- 4 (2.8)
dp2
d 2n 20 60 c -20
--=----=--< 0 (2.9)
dp 2 8c3 16c4 16c3
which means that p = 2c leads to a maximum value of n.
We want to emphasize that the procedure described above has limited real-world
applicability. The demand function was assumed to be deterministic whereas in re-
ality all coefficients are estimates, and we would want to know the quality of these
estimates, and how sensitive optimal price and optimal profits are to changes in their
values. Furthermore, in Section 2.4, the limit price was not considered. Adding this
constraint makes the optimization procedure somewhat more difficult. We should
also bear in mind that the example discussed in this chapter was kept unrealisti-
cally simple. Marketing problems generally involve more than one decision variable,
multiple objectives (for example, market expansion in addition to a profit goal and
some constraints), multiple products 7 and multiple territories. In addition, unit vari-
able production cost may vary with quantity produced and thus with price. Finally,
monopolies being the exception rather than the rule, models will have to account for
competitive activity. 8 Nevertheless, the example illustrates that part of a real-world
system can be represented in different ways, according to the degree of explicitness
chosen.
Before delving into details of marketing model building, we discuss the potential
benefits that can be derived from models.
In Section 3.1, we examine whether marketing problems are in fact quantifiable,
and to what extent.
In Section 3.2, we examine direct and indirect benefits of models, the primary
purpose of a model being to provide decision support for marketing management.
We recognize that models may be developed for different reasons. For example,
models can be helpful for the discovery of lawlike generalizations, which may im-
prove our knowledge and understanding of marketing phenomena. We elaborate on
this type of model building in Section 3.3. We note that the approaches presented in
Sections 3.2 and 3.3 are complementary. Models for decision making are often short-
term oriented. In the process of building such models, we will learn about marketing
relationships, and this learning may lead to the discovery of lawlike generalizations.
At the same time, decision models should incorporate any generalizable knowledge
that already exists. In this sense the approaches inform each other.
We conclude Chapter 3 with a short case study in Section 3.4. This case serves
to clarify the basic model concepts defined in Chapter 1, and at the same time it
illustrates some of the benefits listed in Section 3.2.
Intuition is the basis for much marketing problem solving. This is generally justified
by the argument that, because of their complexity, marketing problems are of a non-
quantifiable nature. This complexity is sometimes used as an excuse for a reluctance
to build a model. We do not pretend that a mathematical model can be used to ap-
proach every marketing problem, nor that each problem is completely quantifiable.
In general, marketing problems are neither strictly quantitative nor strictly qualita-
tive, and we believe that both quantitative and qualitative approaches are necessary
to solve marketing problems. Just as a vector has not only a direction but also a
length, marketing phenomena may be considered as consisting of two components,
a qualitative and a quantitative (or quantifiable) one. Thus, we do not support the
argument that a mathematical model can constitute a complete solution for every
marketing problem. But we also do not subscribe to the notion that sophisticated
21
22 CHAPTER3
approaches are useless for the ill-structured problems that are common in marketing.
To illustrate the tensions consider the following example. 1
The marketing director of a company wants to determine the size of his sales force. 2
He knows that a strong sales force is important, but also that at some point increases
in the size are subject to the law of diminishing returns. His intuition cannot tell him
whether the size of the sales force should be, say, 5, 10 or 15 persons. Estimates of the
expected returns from sales forces of various sizes may assist him in the choice of a
number that strikes a balance between the demand for immediate return on investment
in the sales force and a desire for investment in sales growth. To this end he needs a
formalized model, data and a numerical specification obtained from data. However,
the marketing director is faced with the problem that the fluctuations in his sales force
have been modest over time: the size has varied from seven to ten representatives.
He believes that past data cannot be used to infer what the relation will be between
returns and size outside the range of values that have occurred in the past. This lack
of variability in the data can be overcome by collecting subjective judgments (see
below).
Once an optimal budget for the sales force is determined, the marketing manager
still has to formulate the manner in which the sales people should divide their time
between large and small customers and/or between acquiring new customers and
keeping existing ones. An analytical treatment of an experiment can indicate how
salespeople should allocate their time. A distribution of customer sizes plus a measure
of the effectiveness of the sales force may indicate how many customers the sales
force should concentrate on and how much time should be spent on holding current
and on converting new customers. The marketing director and the sales people already
know that it is harder to acquire a customer than to keep one. Their intuition fails to
tell them how much harder this is. They also know that it is important to concentrate
on large customers. However, their intuition can not tell them whether this should be,
say, the 500 largest, the I ,000 largest or the 5,000 largest.
From this example it is clear that the direction of a solution for a marketing
problem can often be found by a qualitative analysis. A more quantitative treatment
of the problem can be instrumental in finding the "approximate length" of the solu-
tion. Models and data are necessary, although not always sufficient to determine this
"length".
If market data are not available for model estimation, marketing scientists can
rely on managerial judgments. These subjective judgments can be used to calibrate
a model. This approach is used, for example, in the CALLPLAN model for Syntex
(Lodish, 1971). This approach represents the best one can do with the limited in-
formation at hand but it can leave everyone somewhat uneasy. 3 On using subjective
estimates, the Syntex senior vice-president of sales and marketing concluded:
"Ofcourse, we knew that the responses we estimated were unlikely to be the 'true
responses ' in some absolute knowledge sense, but we got the most knowledgeable
people in the company together in what seemed to be a very thorough discussion,
and the estimates represented the best we could do at the time. We respect the
model results, but we will use them with cautious skepticism "
(Clarke, 1983, p.lO).
The next example illustrates how a qualitative and a quantitative approach are com-
plementary.4 In the Netherlands, 1989 was a crucial year in the development of
the television advertising market. In October 1989, private broadcasting was intro-
duced (RTIA) and there was a substantial expansion of air-time available for TV
commercials on the public broadcasting channels.
Based on pressure from various parties, the Dutch government had to face the
problem whether or not to permit the introduction of private broadcasting in the
Netherlands in 1988. This issue is closely related to other problems connected with
the laws and rules surrounding the public broadcasting system. In this system, the
total time available for commercials was restricted, and for years there had been an
excess demand for broadcast commercials. In some periods the demand for television
advertising was three times the supply. The government restricted the time available
for commercials to minimize the transfer of advertising expenditures from daily news-
papers and magazines to television. This transfer was a threat to the survival of many
of the smaller newspapers and magazines, and their survival was considered essential
to the preservation of a pluriform press.
International developments (such as adaptations in laws, as a consequence of the
birth of the internal European market (after 1992), the entry of European satellite
broadcasting stations), and the lobby and pressure of advertisers and advertising agen-
cies led the government to reconsider the amount of air time available for advertising
on public television. In order to obtain a basis for sound decision making, several
marketing scientists were invited by the Dutch government to study:
Econometric models were used to estimate the effect of an expansion of public broad-
casting advertising time on the allocation by firms of advertising expenditures to daily
newspapers and magazines. The results of this analysis are described elsewhere. 5
However, these models could not predict the effects of new entrants. Hence, the
challenge was to develop and use a research method to obtain quantitative predictions
of the effects of the introduction of private broadcasting on advertising expenditures
in other media. For this, the researchers employed an intention survey which was
In this section, we consider benefits which may result from building and using mar-
keting models. A more complete discussion of benefits and possible disadvantages
("costs") is postponed until Chapter 20.
We first consider decision models. These models focus, or at least they should
focus, on the manager's perception of the environment in which he operates. A mar-
keting scientist attempts to describe this world with models in such a way that he
can give the decision maker assistance. 6 Typically one starts with a specific prob-
lem in one particular firm, with a manager looking for help in making decisions.
The resulting models do not provide final answers to the general problems of new-
product selection, marketing mix decisions, sales force allocation, media selection,
etc. Instead, the models should be helpful in specific situations, which of course does
not preclude that learning will take place, eventually resulting in knowledge of a more
general nature. Models that focus exclusively on the generation of knowledge do not
satisfy the criteria of decision models.
Since decision models are built originally for solving specific problems, some
solution is required in the immediate future. As a result, decision models are based
on what we know, or on what we think we know. The decision maker does not have
the luxury to wait for additional or new knowledge to be developed elsewhere. Thus,
decision models must be directly applicable. In that case the models are often referred
to as being relevant. We call this a managerial or operational definition of relevance.
Not all the work in marketing science fits this narrow definition. For example, aca-
demic research which is currently considered non-implementable in practice, might
become operational a few years hence. Therefore it would be a mistake to require
all model building to be directly applicable. Such a requirement would reflect a very
short-term view, which affects long-run progress in the field negatively.
We now turn to the expected benefits. Here we make a distinction between direct
and indirect benefits. Although the line between these two types of benefits is not
always easy to draw, we define indirect benefits to be those that are not related directly
to the reasons for which the model was built in the first place. In this sense, most
indirect benefits will only be realized slowly over time.
6. This is reminiscent of Bowman's ( 1967) early work on consistency and optimality in management decision
making.
BENEFITS 25
Direct benefits1
Companies invest in model building presumably because it leads to better decisions.
"Better" is understood here as contributing to the fulfillment of the company's goals.
For example, if the firm's single objective is to maximize profit, the benefits of a
model can be defined as the incremental discounted profit generated by having the
model as opposed to not having it. 8 This requires knowledge of the amount of incre-
mental profit over time, or of some proxy measure. Furthermore, the relevant time
horizon has to be determined, and a discount rate defined.
We provide a few examples to suggest how direct benefits may materialize:
In some cases it is difficult to measure the benefits directly while in other cases it
is straightforward. Examples of both are given in Section 20.3. The measurement is
complicated by the fact that a cost-benefit evaluation should be carried out before (1)
the model is built and (2) before it is implemented.
Indirect benefits
2. Models may work as problem-finding instruments. That is, problems may emerge
after a model has been developed. Managers may identify problems by discovering
differences between their perception of the environment and a model of that environ-
ment. To illustrate, Leeflang ( 1977a) observed a negative partial relation between the
market share of a detergent brand and the penetration of the number of automatic
washing machines. One interpretation of this relation is that the brand was perceived
by households as being less appropriate to use in these machines (relative to other
brands).
3. Information is often available but not used. There are many examples of decisions
which would have been reversed if available information had been used. Management
may not know that data exist, or may lack methods for handling the information.
Models can be instrumental in improving the process by which decision makers deal
with existing iriformation.
4. Models can help managers decide what information should be collected. Thus
models may lead to improved data collection, and their use may avoid the collec-
tion and storage of large amounts of data without apparent purpose. Clearly, model
development should usually occur before data collection. 11
5. Models can also guide research by identifying areas in which information is lack-
ing, and by pointing out the kinds of experiments that can provide useful information.
By using models, managers have a better understanding of what they need to know
and how experiments should be designed to obtain that information. To illustrate,
suppose that by parameterization of a model we learn that the average effect of
advertising on sales in some periods differs from the average effect in other time
periods. To explain such differences we need additional information about changes in
advertising messages, the use of media, etc.
Sales
A B
casted and observed values persists in subsequent periods. This points to a very useful
aspect of models namely, their diagnostic capacity. Since the deviation in period 21
is larger than in previous periods, the managers may conclude that something has
changed in the environment. It remains to be determined exactly what has changed,
but the model warns the manager faster than is usually possible without it.
The difference between actual and forecasted sales may be due to competitive activity,
such as a new product being marketed with substantial marketing effort, or a price
decrease by a competitor or another factor that influencessales but is omitted from
the model. The model helps the manager to detect a possible problem more quickly,
by giving him an early signal that something outside the model has happened. 12
on. However, much of the theoretical analysis occurs after patterns in the data have
been noted (induction).
Is "sonking" (TE) really bad? There are certainly many marketing scientists who do
not think so, and want to "sonk" because they want to learn. A typical example is the
observation by Forrester ( 1961 ), who argues that the first step in model building does
not consist of an extensive data collection exercise. Instead, a model should come
first, and it should be used to determine what data need to be collected. Forrester's
view is in sharp contrast with Ehrenberg's models of fact approach, and Ehrenberg
might put Forrester's way ofbuilding models in the category of sonking. The rationale
underlying Forrester's work is that model building is one approach for finding out
what we do not know, or for discovering what kinds of things would be of interest to
know. Unlike Ehrenberg, we believe this kind of modeling can be very valuable, as
long as we do not call the outcome "the theory". Instead, the TE approach is useful for
the development of new hypotheses or conceptualizations. In addition, once empirical
generalizations are formed they cannot be taken for granted and they are subject to
falsification in a Popperian sense.
We believe that both ET and TE types of research can be productively employed
in marketing. For Ehrenberg's approach to be successful, the data patterns must be
relatively easily observable. Powerful, but simple, ideas are used to develop explana-
tions for these patterns. In applied econometrics the ET approach may focus on main
effects for marketing variables (e.g. what is the price elasticity) and on the use of data
that do not require complex estimation techniques (e.g. experimental data).
More extensive theory development, prior to data analysis, may be required to
propose more complex relationships. We imagine that many interaction effects will
not be discovered from Ehrenberg's approach. Both theoretical and analytical work
may be prerequisites for the specification ofhigher-level models. In addition, sophisti-
cated estimation procedures may be needed to obtain empirical evidence in support of
the theory. For example, data on the sales of products receiving promotional support
(such as a temporary price cut) often show no evidence of a dip in sales after the
promotion is terminated. Such a dip is expected if consumers engage in stockpiling.
Thus, the ET approach is likely to ignore the possibility of a dip, whereas the TE
approach would, after a failure to observe dips in simple models, consider more
complex models that can capture the phenomenon.
In more mature scientific work any descriptive result would be followed by further
empirical testing (E), more well-based theorizing (T) and speculation, more empirical
testing (E), and so on. In this way science traditionally moves into "more characteris-
tically looped ETET ... models". Thus "in the long run" the difference between ET
and TE would largely disappear if the initial theory in TE is tested not on a Single Set
of Data (SSoD) but on Many Sets of Data (MSoD). 14 In this regard Ehrenberg (1995,
p. G27) speaks about:
"a design switch away from collecting a Single Set of Data (SSoD) toward ex-
plicitly collecting Many (differentiated) Sets ofData (MSoD)."
In the more recent Til-based marketing science literature many empirical general-
izations or "laws" using "meta-analysis" are described. Meta-analysis refers to the
statistical analysis of results from several individual studies for the purpose of gener-
alizing the individual findings (Wold, 1986). The primary benefit of meta-analysis in
marketing is that it delivers generalized estimates of various elasticities, quantitative
characteristics of buyer behavior and diffusion models, and the accuracy associated
with estimated parameters and model fits. The assumption is that different brands and
different markets are comparable at a genera1leve1 but that at the same time model
parameters to some extent vary systematically over different brand/model settings
in an identifiable manner (Farley, Lehmann, Sawyer, 1995, p. 0.37). The empirical
generalization
. . . "is a pattern or regularity that repeats over different circumstances and that
can be described simply by mathematical, graphic or symbolic methods. A pattern
that repeats but need not be universal over all circumstances"
(Bass, 1995, p. 07).
15. See, e.g. Leone (1995), Lodish eta!. (1995b) and Section 18.3.
16. Many other examples of price elasticities are found in Lambin (1976), Verhulp (1982), Tellis (1988b),
Hanssens, Parsons, Schultz (I 990, Chapter 6) and Ainslie, Rossi ( 1998).
32 CHAPTER3
The increasing amount of research on the effects of promotions has also led to gener-
alizations such as (Blattberg, Briesch, Fox, 1995):
We conclude this chapter by presenting a case study to clarify some of the concepts
introduced in Chapters 1 to 3. In this hypothetical- yet quite realistic- example, the
reader should recognize:
1. the use of the concepts, system (everything that relates to a manager's problem),
model (representation of the most important elements of a perceived real-world
system) and what is meant by these "most important elements";
2. the process that ultimately leads to a model;
3. the relation between intended use and degree of model explicitness;
4. the benefits and costs inherent in marketing modeling.
17. See for a survey of generalizations in marketing the special issue of Marketing Science, vol. 14, nr. 3, part
2, 1995 and more specifically the survey in Bass, Wind ( 1995). See also Hanssens et al. ( 1990, Chapter 6).
18. The problem description has to a certain extent been inspired by the Heinz Co. case, available from Harvard
Clearing House.
19. The values in Table 3.1 and in the rest of the discussion are given in real dollars, i.e. they have been deflated.
BENEFITS 33
1986 7.5
1987 7.6 1.3
1988 7.9 4.0
1989 8.1 2.6
1990 8.6 6.2
1991 9.0 4.7
1992 9.7 7.8
1993 10.5 8.2
1994 11.6 10.5
1995 12.9 11.2
the Board that action is called for. All members agree that the distribution costs are
excessive, and that the distribution network should be examined to discover poten-
tial savings. They realize that the problem is complicated, and that outside help is
required. In April 1996, Wareloc Consultants was invited to work on the project. The
consultants formulated the problem as one of minimizing the total cost of distributing
500 products to 2,000 retailers.
In the Good Food Co. case, the problem, system and model can be defined as
follows.
The problem
As formulated by management and as perceived by Wareloc Consultants, the problem
was to determine the number of plants and warehouses, where to locate them, what to
produce where, and to obtain a transportation schedule to minimize the total cost of
distribution. After further consultation with Good Food Co. management, the solution
space became somewhat constrained. No new plants were to be added and there could
be no technology transfer from one plant to another, which meant that the current
product mix at each plant could not be changed. Within prevailing capacity limits,
the production of specific products could be increased as well as decreased or even
eliminated.
The system
With this information, the elements of the system would include:
I 12.3 60 70
2 12.7 67 80
3 13.1 70 85
4 13.1 75 82
5 13.1 68 88
6 13.5 72 87
7 13.5 70 89
project was intended to reduce distribution costs, and that the marketing people would
not be of much help. After he had studied the report in depth, especially the location
of the warehouses in the proposed plan, Mr. Sell concluded that sales would be badly
affected if the plan were implemented. In the past, the company had been able to fill
about 70 percent of the orders within 24 hours, and about 85 percent within 48 hours.
In Mr. Sell's opinion, this rate of service would not be possible under the proposed
plan. Good Food's high service rate had been one of the company's strongest com-
petitive advantages. Mr. Sell felt that without this advantage sales would decline. He
discussed the matter at the next Board meeting on October 31, 1996 and after much
debate a decision was made to ask Wareloc Consultants to examine some alternative
warehouse location plans and the corresponding service rates.
Wareloc Consultants spent another two months relating cost ofdistribution to rate
ofservice. Some ofthe options are shown in Table 3.2. From Table 3.2 we can draw
the following conclusions. With the plan originally proposed by Wareloc Consultants
(alternative 1), a substantial decline in the service rate is expected. With the current
rate of70 percent served within 24 hours, and 85 percent within 48 hours, distribution
cost is estimated to be $13.1 million dollars (alternative 3). This represents an increase
in the distribution cost of almost 1.6 percent [{(13.1-12.9)/12.9} x 100%] over
last year for an estimated 4 percent increase in sales. This indicates the existence of
potential savings, based on the projected increase in distribution costs, even if the
service rate is not affected. The table further indicates that, for a given distribution
cost, a trade-off exists between 24 and 48 hour service rates. Management now can
contemplate the value of (high) service rates versus (low) distribution costs.
This discussion provides insight into the following aspects:
1. The concepts problem, system and model were illustrated as was the selection
of the most important elements. In the Good Food Co. example the problem itself
was perhaps perceived incorrectly. It was seen as one of excessive distribution cost,
when in fact one should look at the global performance of the company. 24 We can
still examine the distribution network but we should do so by keeping in mind the
24. The study of global systems presents problems of its own as indicated in Chapter 19.
36 CHAPTER3
2. The process of problem finding, model building, redefined problem finding, and
redefined model building has been elucidated.
3. The intended use of the model could ultimately be defined as searching for a
balance between distribution cost, on the one hand, and service rate on the other
hand, leading to a satisfactory profit performance. This means that the initial model
does not provide a correct answer to the question what distribution network will lead
to maximum profit. Getting a correct answer requires the determination or estimation
of a relationship between demand and service rate into the model. At this point, the
company does not know how demand and service rate are related. Subjective esti-
mates could be obtained. Alternatively the company might decide that this problem
deserves a separate study. In other words, the original modeling effort has resulted
in the identification of a new problem. Also, even if the dependence of demand on
the service rate were known, it may or may not be useful to do extensive modeling.
For example, the model may become too complex, and the search for a solution too
difficult, relative to the expected benefits. This points to the need for management to
consider cost-benefit aspects in model building.
4. The direct benefit of a model is observable in the profit increases expected from
model use. As indirect benefits we have:
• improved understanding of the problem resulting in a redefinition;
• insights that suggest the need for additional data collection;
• identification of areas in which data are needed.
25. Because of the complexity of the problem, the company will normally be satisfYing rather than optimizing.
CHAPTER 4
We discuss the meaning and significance of these dimensions in this chapter. This
should facilitate the reading of Chapters 8, 9 and 10 which deal with intended use,
level of demand, and amount of behavioral detail. The typologies are also relevant for
the discussion of problems and issues in parameterization (Chapters 16 and 17) and
validation (Chapter 18) of models.
Models can be classified according to purpose or intended use, i.e. why might a firm
engage in a model-building project. Different purposes often lead to different models.
To illustrate, we refer to the example in Section 2.2 about a firm that wanted to
improve its pricing-decision process. For that purpose the firm examined the various
steps, and their sequencing, it used to arrive at pricing decisions. In that case, the
desired outcome of the model-building effort is a descriptive model and not a model
of how the firm should (i.e. a normative model) determine its prices. This example
shows how a model's intended use is an important determinant of its specification.
We distinguish between descriptive, predictive, and normative models. Descrip-
tive models are intended to describe decision- or other processes. A descriptive model
of a decision process may be an end in itself, in the sense that decision-making
procedures are often quite complicated and not well understood. A decision maker
may wonder how particular decisions are arrived at in her organization or by her
37
38 CHAPTER4
customers. The decision maker may want to trace the various steps that lead to the
decisions, and identify the forces that influence the outcome(s) of the decision pro-
cesses.
The purJ)ose of a descriptive model may also be to find out whether there is a
structure that would allow for automation of part or of the entire decision process.
The objectives may be restricted to a desire to describe the existing procedure, and
the modeling process itself will not necessarily improve the firm's performance. It is
possible, however, that a descriptive model of an existing decision process will show
opportunities for improvement. Thus, a modeling effort may start in the descriptive
phase and ultimately reach the normative phase.
The degree of explicitness will vary from one situation to another. In many cases,
however, descriptive models will be of the logical flow type, with parts of the model
expressed in mathematical terms. We show examples in Section 8.1.
Process models constitute a specific subset of descriptive models. Such models
describe processes of decision makers. A process model consists of a sequential
specification of relations. The model is specified in such a way that the effect of a
change in one variable on other variables via intermediate relations can be examined.
An example is the process model developed by Hoekstra (1987) to describe the be-
havior of heroin users. Model components include the process of obtaining money
and heroin, getting imprisoned, abstaining from heroin, and enrolling in methadone
programs. It is based on a logical flow diagram in which the steps in the process
are made explicit. The behavior of heroin users is described based on information
obtained from interviews. The model has been used to carry out scenario analyses to
examine the effects of policy-based activities such as methadone treatment and law
enforcement. While process models can describe complex purchase situations, it is
not possible to aggregate the described behavior over heterogeneous members of the
population(Lilien, Kotler, Moorthy 1992, p.56). 1
Descriptive models are not restricted to decision problems. For example, one
may describe a market by the structure of brand loyalty (the percentage of people
buying product i in period t + 1, who bought the same brand on the previous purchase
occasion), and brand switching (percentage of people changing from brand ito brand
j). That is, the market could be described by a stochastic brand choice model. We
discuss such descriptive models in Chapter 12.
By predictive models we mean models to forecast or predict future events. For exam-
ple, a firm may want to predict sales for a brand under alternative prices, advertising
spending levels, and package sizes. While the purpose of a predictive model is, as the
word indicates, prediction, it is possible that, in order to arrive at a predictive model,
we start with a descriptive model, perhaps in terms of current procedures, but also in
terms of the elements relating to the problem. For illustrations, see the new-product
evaluation model in Section l 0.3 and stochastic brand choice models in Chapter 12.
Forecasting or prediction does not always mean answering "what if' types of
I. Other examples can be found in e.g. Zwart (1983), Brand (1993) and Pbam, Johar (1996).
TYPOLOGY OF MARKETING MODELS 39
questions, such as, how does the demand change if price is increased by ten percent.
In some brand choice models, the structure of brand loyalty and switching is sum-
marized in a transition probability matrix. A general element of that matrix, Pij, t+ 1,
represents the probability that a person who buys brand i in period t will buy brand j
in period t + 1. 2 Based on the transition probabilities and the current distribution of
market shares, we can predict the evolution of market shares in future periods. In this
case, the forecasts are not conditional on marketing activities.
Demand models make up a special class of predictive models. We refer to a
demand model when we have a performance variable related to a level of demand.
This performance variable may depend on a number of other variables, such as mar-
keting decision variables employed by the firm and its competitors. We discuss this
important class of predictive models further in Section 4.2 and in Chapter 9.
We conclude this section with a few observations. Researchers often believe that the
ultimate output should be a normative model and that descriptive and predictive mod-
els are logical conditions. That is, one first describes the problem or one systematizes
the various elements of a problem in a descriptive sense. Next one tries to answer
"what if' types of questions. For example, what will sales be ifthe firm spends twenty
percent more on advertising. Finally, the "what should" stage is reached. For example,
how much should be spent on advertising in order to maximize profit.
In some cases the sequence is indeed from descriptive to predictive to normative
models. In other situations, as noted earlier, a descriptive model may be sufficient. In
still other cases, one may stop at the predictive stage, even if optimization is analyti-
cally or numerically possible. One reason for this is that the objective function is often
hard to quantify. A firm usually has multiple objectives, and has to satisfy various
constraints. For example, an objective of profit maximization may be constrained by
requirements such as maintaining market share at least at its current level, and limiting
the advertising budget to a certain percentage of last year's sales revenue. The latter
constraint may seem artificial but could simply reflect the limited cash resources a
firm has. Also, other entities in the company may compete for their share of those
scarce resources. In that case, one might prefer to evaluate various alternatives one at
a time, given that these constraints are satisfied, over the use of an algorithm to find
the optimal solution.
2. At the aggregate level, "probability" becomes "percentage of people". The notion of transition probabilities
is dealt with in more detail in Chapter 12.
40 CHAPTER4
Early model building in marketing was often of the normative kind with a con-
centration on the development of solution techniques. One reason for this is that
operations research was still a young discipline with little or no contact with man-
agers. Now that the discipline has matured, there are many more models of the
predictive and descriptive, rather than of the normative, variety. Of course, proper dis-
cussions among marketing scientists and managers will allow the normative models
to be useful.
4.2 Demand models: product class sales, brand sales, and market
share models
We can define the same measures at the segment level and at the level of the individ-
ual consumer leading to models with different levels of aggregation: market, store,
segment, household and so on. Thus we define, for example:
From these definitions it follows that market share of brand j is equal to the ratio of
brand sales of j and product class sales (i.e. brand sales summed over all brands). For
example, consider total sales of cigarettes in period t, sales of, say, Lucky Strike in
the same period, and the ratio of the latter over the former which is Lucky Strike's
market share in period t. Note that all these variables can be defined in terms of
units or dollars. It is common for demand variables to represent unit sales, in model
specifications. One reason is that dollar sales is the product of two variables, unit
sales and price per unit, and the use of such combinations inhibits the interpretation
3. The tenninology adopted here is not unique. Product class sales, brand sales, and market share models are
also referred to as primary demand, secondary demand, and selective (or relative) demand models. See, for
example, Leeftang ( 1976, l977a), Schultz, Wittink (1976).
TYPOLOGY OF MARKETING MODELS 41
1. no behavioral detail;
2. some behavioral detail;
3. a substantial amount ofbehavioral detail.
Since the amount of behavioral detail is a continuous variable, and therefore is not
easily discretized, only the first class can be unambiguously defined.
The second category consists of models where some behavioral detail is explicitly
shown. This can be done in different ways. One such type of model, in which the
detail is provided at an aggregate level is the aggregate flow model. An example is
the new-product evaluation model SPRINTER (Mod. !-Behavioral Option) developed
4. In Section 5.2 we define environmental variables as variables which are outside the marketing system or
beyond the control of decision makers.
5. There are other possible distinctions with respect to model detail. For example, one could build company
or corporate models, or models of fimctional areas; or models dealing with a product category, or a brand, or
models related to specific marketing instruments. The focus would then be on the degree of model integration.
We touch upon these ideas in Sections 5.1 and 19.2.3.
42 CHAPTER4
Behavioral process
(Intervening variables)
6. Some aggregate flow models contain just a few behavioral elements. Others contain many more. Urban's
Mod. I version (1969a) of the SPRINTER model is an example of the former, his Mod. III version (1970),
an example of the latter. See Nijkamp (1993) for an extensive survey and evaluation of these models. The
SPRINTER model is discussed in Section 10.2.
7. McGuire (1969, p. 142) writes in this respect: "Allport reviewed 16 earlier definitions of attitude before he
ventured his own as a seventeenth. Nelson listed 30 such definitions, and Campbell and DeFleur and Westie,
among others, many more."
TYPOLOGY OF MARKETING MODELS 43
Marketing Behavioral
instruments process Transition :..._______.
Environmental f------ (Intervening f------ probabilities
Market share
variables variables)
Figure 4.2 Model with some behavioral detail: a model of intermediate market response.
8. For a more elaborate discussion, see Leeftang (1974, Chapter 7), Naert, Bultez (1975), Leeftang, Boonstra
(1982).
9. See, e.g. Boulding, Kalra, Staelin, Zeithaml (1993).
10. See, e.g. Bagozzi (1994a).
II. LISREL is a powerful multi-purpose computer program that facilitates the estimation of a variety of models.
See, e.g. Sharma (1996, p. 426).
44 CHAPTER4
to models which contain a substantial amount of detail, there is, on the one hand, an
increasing richness of representation of system phenomena and, on the other hand,
greater complexity of measurement, estimation and validation. We return to these
questions in Chapter 20.
Time series (models) capture response behavior over time from the systematic pat-
terns in variables and disturbances. The main interest is to explain the variability in
one or more variables over time without explicitly formulated causal variables. The
response variable's fluctuations may be explained, for example, by lagged indepen-
dent variables and/or time. Consider relation (4.1) in which the sales of a brand in
period tis explained by two lagged sales variables. 12
(4.1)
where a is a constant term, fh and fh are response parameters, and T is the number
of observations.
An alternative time-series model is relation (4.2):
(4.2)
where
y, OJ, 02 = parameters.
Relation (4.2) can be estimated, for example, by monthly data covering a two-year
period (T = 24). If the estimated values of OJ and 02 are respectively positive and
negative we obtain a bell-shaped curve which would represent part of the well-known
curve of the product life-cycle.
Time-series models can be used to yield accurate forecasts for management but
they usually do not produce useful knowledge about response behavior. 13 The ap-
proach does not reflect the idea that a performance measure such as sales can be
controlled by the manipulation of decision variables such as price and advertising.
By contrast causal models assume that variables under management control and envi-
ronmental and other variables affect response in a direct and explicit way. The models
focus on the relations hetwe>en variables. Consider for example:
in period t,
the average temperature in period t (degrees of Fahrenheit),
price per unit of a competing brand in period t,
the advertising expenditures (in dollars) of a competing
brand in period t.
In relation (4.3) fluctuations in the sales of a brand are explained by the brand's
own marketing instruments (price and advertising), an environmental variable (tem-
perature), and marketing variables referring to the brand's competitor (pf, ai). The
environmental variable, temperature, is an explanatory variable that is outside the
control of the brand's management. This variable is a candidate for the explanation of
fluctuations in the sales of products that depend on seasonal influences, such as beer,
clothing and canned soup.
The term "causal" is also used in other contexts. In LISREL models "causal
paths" between variables are identified (Section 17 .I), and "causality" has also a
specific meaning in validation (Section 18.4).
Much of market modeling has dealt with descriptions, predictions and normative
decisions concerning single products as brands. Yet, most firms manage multiple
products. If a firm has several brands that belong to the same product line, it is impor-
tant to consider the total performance across the various brands. For example, at the
product line or category level we should monitor cannibalization, that is, the degree to
which one brand gains market share at the expense of other brands of the same firm.
More specifically, suppose that a given brand is offered in four different package
sizes. Either because larger sizes have scale economies which reduce unit cost or
because consumers who prefer larger sizes are more price sensitive, it is common
for the unit price to be lower for larger package sizes. However, if smaller sizes are
temporarily discounted, it is conceivable that many consumers temporarily switch to
smaller sizes. In this case, a category-level approach is required to accommodate the
switching between different sizes.
brand j). The strategic management approaches have been primarily concerned with
the problem of defining a portfolio of product groups.
In this book we concentrate on the modeling of single products. We discuss multi-
product marketing models in Chapter 13 and in Sections 14.4-14.6. The multiproduct
models we discuss in Chapter 13 deal with:
In Chapter 14 we discuss models that account for the competition between items, i.e.,
brand-variety combinations. The models in Chapter 14 can account for the competi-
tion between items belonging to the same brand as well as the competition between
items of different brands.
PART TWO
Specification
CHAPTER 5
A traditional view
Traditionally one often distinguishes the following steps:
a. Specification (or Representation or Structure) is the expression of the most im-
portant elements of a real-world system in mathematical terms. This involves two
major steps:
49
50 CHAPTERS
d. Application or Use of a model. This step means experience with the model, given
successful validation, opportunities for continued testing as well as model adjust-
ments, model updating, etc.
An implementation view
In the traditional view, no explicit attention is given to implementation in the model-
building process. We propose a process that is designed for model implementation.
This process is described in Figure 5.1 and is explained below.
1. Opportunity identification
In this stage a model builder has to evaluate whether the development/use of a model
can improve managerial decision making. The model builder will often be invited to
consider the opportunity by a manager who is overwhelmed by demands on his/her
time or who believes that the effectiveness or efficiency of decision making can be
improved. Ideally the model builder and manager work together to define the prob-
lem, to agree on an approach and to determine that the expected benefits exceed the
costs of model building. It is also important to establish that the model builder knows
the right tools and does not unduly favor specific approaches. 3
2. Model purpose
The intended use of the model should be defined as precisely as possible. For ex-
ample, the manager may need a model to obtain accurate sales forecasts. The model
I. Opportunity identification
2. Model purpose
3. Model scope
4. Data availability
5. Model-building criteria
6. Model specification
7. Parameterization
8. Validation
9. Cost-benefit considerations
10. Use
11. Updating
builder needs to know the level of detail for which forecasts are required. The model
builder also needs to learn what the manager believes to be the relevant determinants
of sales so that model-based conditional forecasts can be developed. In this step both
the level of demand and the amount of behavioral detail should receive attention.
3. Model scope
Model building can take place for a specific type of decision or for a broader set
of decisions. The manager may want the model-building effort to focus on a single
decision variable. Thus, the desired outcome may be a model of advertising effects.
Alternatively, the manager may desire to have a model that includes the effects of all
relevant marketing activities. The latter is actually desirable for at least two reasons.
One is that marketing activities may be correlated in historical data. In that case, in
order to be able to learn the effect of only one variable, the other variables must be
included in the model. Another reason is that even if the variables can be manipulated
experimentally (such that the variables can be uncorrelated), their joint movement
may produce greater or smaller effects than the sum of their individual effects.
Similar arguments may apply to other decision variables pertaining to non-mar-
ELEMENTS OF MODEL BUILDING 53
keting activities. For example, promotions such as temporary price cuts often have
strong effects on sales which may require intensive adjustments in production and
distribution. The financial consequences of those adjustments need to be taken into
account by the manager to determine the total profit impact of the marketing activity.
4. Data availability
One reason a manager may ask for a model-building effort is the increasing avail-
ability of large amounts of data. For example, in packaged, non-durable consumer
goods industries market feedback was provided to managers in the U.S. by means of
bimonthly store audits, until the mid 1980's. The ACNielsen Company did not en-
courage model building based on these data largely because of the highly aggregated
nature of the data. With the introduction of scanners into supermarkets, managers
could obtain much more detailed market feedback much more frequently. 4
To a certain extent the data available through historical records determine the
types of model that can be developed. The model builder should, however, recognize
that experimental methods can provide more powerful insights. Also if the risk or
time required for experimentation is too high and historical data are insufficient, the
possibility of using subjective estimation methods remains. In this step the power of
historical data needs to be established relative to the desired model purpose. See also
Section 15 .1.
5. Model-building criteria
For a model to be implemented its structure should satisfy various criteria. Little
(1970) proposed that a model should be:
a. simple;
b. complete on important issues;
c. adaptive;
d. robust.
It is easy to see that some of these criteria are in conflict. Thus, none of the criteria
should be pushed to the limit. Instead, we can say that the more each individual
criterion is satisfied, the higher the like.lihood of model acceptance.
Briefly, models should be as simple as possible. Simplicity is a virtue in that
simple models are easy to understand and communicate. Since no model is complete,
the more a manager understands its structure, the better the model user can recognize
the model's limitations. However, simplicity will not be acceptable if the cost is that
the model becomes incomplete on important issues. All, relevant variables need to be
incorporated even if the manager is interested in learning about the effect of a single
variable.
If the effect(s) of interest can change over time or over units, the model should be
adaptive. Finally, robustness means that the model is unlikely to generate implausible
answers. We discuss these aspects in more detail in Chapter 7.
6.-8.
The stages of model specification, parameterization and validation are discussed
under the traditional view (a-c).
9. Cost-benefit considerations
We mentioned that in stage 1 (opportunity identification) one should establish that the
expected benefits exceed the (expected) costs of model building. At this point both
benefits and cost should be known with a fair amount of precision. Before the model
is implemented and incorporated in a manager's decision-making process, it is ap-
propriate to re-examine the cost-benefit tradeoff. The question now is not whether the
model-building effort is worthwhile. Instead, it is useful to determine if the insights
gained appear to be more beneficial than the costs. One way to make this practical
is to compare, in the next stage, the decisions that will be made with the benefit of
the model to the decisions that would otherwise occur. In this manner it is possible
to determine whether the model should in fact be used. If the model fails on this
criterion, we can return to earlier stages if there is sufficient promise for a modified
approach to be successful.
10. Use
This stage should correspond to part d. under the traditional view. However, we place
the comments made under d. under our last stage called "updating". In a direct sense,
use of the model requires that the manager fully understands both its strengths and its
weaknesses. We do not need to say much about this stage if the model is truly simple
(see stage 5). However, as model complexity increases, implementation will require
appropriate support. One way to accomplish this is for the model builder to construct
a market simulation capacity. With that the manager should be able to explore how
the market will behave under a variety of conditions which can be influenced by the
manager.
There are several advantages associated with the creation of a market simulation
capacity. One is that the manager does not have to use a mathematical model. Instead,
the implications of the model can be explored. In this case the model builder does
need to place constraints on such explorations, for example, to rule out combinations
of marketing activities for which the model would in fact give implausible forecasts.
11. Updating
Over time, the manager may develop a better understanding of the marketplace, and
this could require modifications in the model. Even without this, the continued com-
parison of actual outcomes with those predicted by the model may suggest that the
model needs to be expanded (e.g. an additional variable or greater complexity in
effects) or that the parameters need to be updated. Thus, the updating in this stage
refers to updating of both the model specification and the parameterization.
The continued comparison of actual outcomes with predictions requires that dif-
ferences (errors) be analyzed so that one can distinguish between errors due to e.g.
model specification, measurement error, aggregation, and changes in the environ-
ment. The model modifications desired depend on which of these types of causes
ELEMENTS OF MODEL BUILDING 55
is responsible for the errors. In this regard it is critical to distinguish between sys-
tematic and random error. Practically, error magnitudes that fall within what can be
expected based on the uncertainty of parameter estimates are considered random. As
the number of observable prediction errors increases it also becomes easier to detect
systematic patterns.
The "implementable model-building process" is an iterative procedure. The proce-
dure, in its entirety or with only a few stages, can be repeated until an acceptable
model is specified and estimated. This is a labor-intensive activity. If the knowledge
of the analyst can be captured in the computer, resulting in a knowledge-based sys-
tem, we have what McCann and Gallagher (1990) call a "MarketMetrics Knowledge
System (M2 KS)". This system is a "marriage" of theory and the application which
may lead to a Generalized Modeling Procedure (GMP). A GMP includes testing the
data, testing the specification, and testing the statistical properties of the developed
model. The GMP starts with data retrieval and data checks. Based on an initial data
check, the GMP formulates a model which contains variables that, for example, pass
the data check for sufficient observations and sufficient variation. After the initial
model estimation, various statistical tests and checks are applied and the model is
corrected. An iterative procedure is used to arrive at a final model. The final model is
stored in the system (McCann, Gallagher, 1990, pp. 106-107).
The objective of the specification of relation (5.1) is to explain variation in the unit
sales of brand j. Thus, qjt is the variable to be explained, the dependent variable or
the criterion variable. For this purpose one specifies:
1. the variable(s) that can explain the variation in the dependent variable, referred to
as explanatory variable(s), independent variables or predictors (a jt);
2. the mathematical form between the criterion variable and predictor(s);
3. a disturbance term which captures the part of the criterion variable that cannot be
explained by the predictors.
1000-r-----------------+-----------------+---------- --------
X
500 1000
Because advertising spending does not fully determine sales, the relation:
1. The larger the error of measurement in the variables, the larger the error compo-
nent u Jt will be. One reason for such error is sampling. For example, sales of brand
j in period t may be obtained from a sample of all retail outlets. Worse, there may
also be systematic measurement errors involved. If some outlets are unavailable for
inclusion (e.g. a retailer may refuse the market research company access) the error
can be systematic.
58 CHAPTERS
2. The disturbance term also represents the error due to missing or omitted vari-
ables. It is clear that, in general, sales also depend on variables other than advertising.
Excluding those variables means that their effects become part of the disturbance
term. Possible reasons for omitting variables are that no data are available, or that
neither the manager nor the model builder imagines their relevance.
There will also be many variables which are difficult or impossible to identify,
but which each have a negligible effect on sales. Collecting information on these
variables, and incorporating them in the model structure will not be worth the cost.
It is possible to assume that the disturbance term is truly random if this condition
applies.
= 1, ... , T, and
T total number of observations.
In relation (5.3) market share is a function of the price of j in period t, which is
defined relative to the average price of the product class. 11 It is also a function of
advertising share in period t-1, based on the belief that advertising does not influence
purchases instantaneously. Also in (5.3) market share in t is a function of its value in
period t-1. This is a reflection of market inertia or of the fact that advertising share in
periods prior to t-1, and relative price in periods prior to t have longer-term effects. 12
Relation (5.4) shows that variation in product class sales is explained by variation
in total advertising expenditures of all brands, and by variation in disposable income.
From the definitions above, it follows that mit is a criterion variable in relation
(5.3). However, mit also defines, together with Q1 , the unit sales variable in (5.5). In
general, variables are placed in two distinct groups according to whether or not they
are explained in the model. Endogenous variables are those which are to be deter-
mined by the phenomena expressed in the model. Exogenous variables, on the other
hand, are determined outside the model. Thus, we can say that a model represents
the determination of endogenous variables on the basis of exogenous variables. In the
model described in relations (5.3 )-( 5.9), the following variables are endogenous: mit,
Q 1 , qit• Rit• TCit• rrit• rrJ1 • These variables are explained by equations (5.3)-(5.9).
In this case, the number ot endogenous variables is equal to the number of equations.
k
The exogenous variables are: ai,t-1. L~=! ar,t-1. Pit• L:~=l p,1, lnc1, ci andFCi.
The exogenous variables can further be classified as:
1. decision variables (also called instruments or controllable variables), and
10. Variable cost could also be time varying, in which case Cj becomes Cjt·
II. Here, the average price is obtained as a simple arithmetical average. A sales weighted price could be used
instead.
12. We discuss this in Chapter 6.
60 CHAPTERS
2. environmental variables.
The decision variables include variables controlled by firm j and variables controlled
by j's competitors. Examples of the former are aj,t-1 and Pjt; examples of the latter
are ar,t-1 and Prt, r = 1, ... , n, r f= j.
Exogenous
< CWTent /Predetennined
Lagged
13. It is also possible to express the cWTent endogenous variables in tenns of cWTent and lagged exogenous
variables only. This expression is called a final form, and is obtained by a repeated elimination of all lagged
endogenous variables from the reduced-fonn relation. See, for example, Judge et at. (1985, p. 661).
ELEMENTS OF MODEL BUILDING 61
obtained:
~ ~ aj,t-l ~ Pjt
(1 - r) { (Pjt- Cj)(/3oj + /3lj "n + /3zj! "n (5.10)
L...r=l ar,t-l n L...r=l Prt
Similarly, reduced forms for lrjt. TCj 1 , Rj 1 , and qjt can be obtained. The jointly
dependent variables m jt and Q, are already expressed in terms of predetermined
variables, in relations (5.3) and (5.4) respectively.
The variables in a marketing model can not always be measured on a metric
scale. 14 For some variables only the presence or absence of an activity can be reg-
istered. We then refer to qualitative variables. A special class of these consists of
so-called dummy variables. Typically, these variables are assigned a value of one
in the presence, and zero in the absence, of a given characteristic. As is shown in
subsequent chapters, dummy variables may be introduced in order to account for
unusual or special events. For example, let Q 1 in (5.4) represent product class sales
of detergents. In the summer months (June, July, August) the sales of detergents may
increase because people spend more time outside and clothes have to be washed more
frequently. It is also possible, however, that there are less sales because people wear
less clothes. Anyhow, we may observe a temporary shift in the demand curve. We can
accommodate this by adding a dummy variable to equation (5.4):
n
Q, = Yo+ Yl Lar,t-l + n.lnc, + Y3lit + v, (5.4a)
r=l
where
li1 = 1 for t = June, July, August, 0 otherwise.
Thus, if Y3 > 0, predicted sales shifts upward and if 93 < 0 it shifts downward in
June, July, August, holding other things constant, and will return to its normal level
afterwards. We note that in (5.4a) we assume that the change in sales is the same for
all three months. If this is not realistic, (5.4a) can be modified by introducing addi-
tional dummy variables. Specifically, use of the stockpiled product will correspond to
a reduction in sales.
Another example of a model with dummy variables is the SCAN*PRO model.
This model proposed by Wittink, Addona, Hawkes and Porter (1988) was developed
for commercial purposes and has been used in over 2000 commercial applications in
the United States, Canada, Europe, and elsewhere. This model captures the effects
of temporary price cuts, displays, and feature advertising for multiple brands on one
brand's unit sales in a metropolitan area. This brand sales model, where the brand
14. Variables measured on an interval or ratio scale are called metric. If measured on a nominal or ordinal scale
they are non-metric. See, for example, Torgerson (1959).
62 CHAPTERS
sales are measured at the store level, contains many qualitative variables to account
for:
The weekly indicator variables in the SCAN*PRO model are also used to account for
the possible effects of omitted variables. The "classical" SCAN*PRO model does not
include, for example, coupon distributions or manufacturer advertising variables.
The indicator variables which measure the promotions are proxy variables. Dis-
plays of products by retailers can differ in location, size, etc. Similarly, a retailer
can feature a brand in a weekly advertisement in different ways. Until more precise
measures of these promotions are taken, one has to be concerned about measurement
error in these proxy variables. The SCAN*PRO model is introduced in Section 9.3.
Few if any metric variables are measured or observed without error. Variables are
often measured with some uncertainty or are recorded inadequately. For example, the
endogenous variable "sales" can be measured at the consumer level, by means of a
household panel, which is subject to error due to cooperation refusals, or at the retail
level using a sample of stores. Data on sales can also be obtained using ex-factory
sales. External data on inventories at the retail and wholesale level can be used to
correct the "internal" ex-factory sales data for such fluctuations. Sales data are not
only measured at different levels but also by different methods. For example, sales
data are obtained at the retail level either with scanners or through traditional store
audits. It has been demonstrated 15 that the different methods of registration and the
levels at which sales are measured can lead to large differences in sales amounts.
The same can hold for other relevant marketing variables such as price, distribution
and advertising. Less is known about the sensitivity of parameter estimates to the
kind of data used to parameterize relations. However, models which have the same
(mathematical) structure and the same variables but which are based on data obtained
by different methods or measured at different levels can lead to substantially different
marketing decisions. 16
at all. Examples of such variables are utility, attitude, advertising goodwill, brand
equity, buying intention but also "demand". Unobservable or latent variables may be
distinguished from observable variables or indicators. Indicators may be related to
latent variables in, for example, the following way:
In (5.11) TJt is the latent variable "true" sales and q? is an observable variable. The
disturbance term represents the measurement error. The true sales may be indicated
by different observable variables.
Relations
We can distinguish the following kinds of relations or equations: 18
1. behavioral equations;
2. definition equations;
18. See, for example, Klein (1962, pp. 225-226), Theil (1971, pp. 1-4).
64 CHAPTERS
3. technical equations;
4. institutional equations.
Behavioral equations
Behavioral relations refer to system behavior. For example, equation (5.3) relates
the aggregate behavior of buyers of a brand, expressed in terms of market share,
to price, advertising, and past buying behavior. Similarly, equation (5.4) relates the
aggregate behavior of buyers of a product class, in terms of product class sales, to
total advertising spending and to disposable income. We discuss other examples of
behavioral equations in Section 5.3.
Definition equations
Definition relations feature known identities. We distinguish:
(5.13)
where
Q1 = product class sales in period t (say, Aprill997),
qr1 = sales ofbrand r in period t, and
n total number of brands.
which indicates that the change in product class sales in period t, from period t - 1, is
equal to the sum of changes in sales of all brands r = 1, ... , n, assuming that the total
number of brands remains equal ton. Both (5.13) and (5.14) are identities, provided
the definitions ofthe variables appearing in (5.13) and (5.14) are mutually consistent.
From (5.13) to (5.14) it is clear that the stock type (5.13) definition relations and
the flow type (5.14) definition relations are interrelated.
In some cases, definition relations can be substituted into other relations of a
model, which may reduce the number of variables and equations.
ELEMENTS OF MODEL BUILDING 65
Technical equations
In these equations, variables are related on the basis of their technical connection.
Examples are production functions that describe quantitative relationships between
inputs and outputs. Technical coefficients describe how the former are transformed
into the latter.
If Cj and FCj are known, (5.7) is a definition equation. It is a technical equation,
if the c j and FCj are unknown values. The relation to be estimated could then be
written as:
Obtaining estimated values &o, &1 would give us estimates of FCj and Cj respec-
tively.19
Institutional equations
In institutional equations, parameters result from decisions made by institutions such
as governments (at various levels). An example is equation (5.9) relating before-
to after-tax profit, the parameter r being determined by fiscal authorities. Another
example is the relation between the interest rate on savings paid by banks, and the
length of time savings must remain deposited.
19. The cost function in (5.15) assumes constant variable cost per unit. More complex cost functions are, of
course, possible.
66 CHAPTERS
The distinction is important from the point of view of estimation. Forms 1, and 2 are
estimable by classic econometric methods, whereas 3 is not.
where
y1 = value of the dependent variable in period t,
xe1 value of independent variable l in period t, and
ao, a1, ... , aL = the model parameters.
Equation (5.16) is an example of a linear, additive model. It is additive in the sense
that each predictor contributes only a single effect to the determination of the crite-
rion variable. While this is the simplest possible representation, it also has serious
drawbacks. The linearity assumption implies constant returns to scale with respect to
each of the independent variables. This can be seen by taking the first-order partial
derivative of y1 with respect to any of the independent variables xe1 :
ayr
-- = ae, l = l, ... , L (5.17)
axer
which means that increasing xe 1 by one unit results in an increase of y1 by ae units
holding the other variables constant. This assumption of constant returns to scale is
unreasonable most of the time. For example, if xe is advertising andy sales, we might
expect an increment in xe to have more effect when xe itself is lower than when it is
higher. This means that we expect advertising to have decreasing returns to scale. 21
Another drawback of the linear additive model is that it assumes no interactions
between the variables. This can again be seen by looking at the first-order derivative
in (5.17). Since it is constant, it follows that the effect of xe on y does not depend on
20. In most of this section the disturbance term is omitted for convenience.
21. This is confirmed in studies by Little ( 1979). More generally, one expects advertising to show increasing
returns first, then decreasing returns. We show examples of how to model this in Section 5.3.2.
ELEMENTS OF MODEL BUILDING 67
the values of other independent variables. Once again, this assumption is often un-
reasonable. For example, advertising will have a greater effect on sales if the brand is
available in more rather than in fewer retail stores. At decreasing levels of availability,
advertising should have increasingly smaller effects (and zero effect if the product is
not available at all).
A second class of models are those which are non-linear in the variables, but linear
in the parameters. They are also called non-linear additive models. Equation (5.18) is
an example of such a model:
(5.18)
in which three variables (XIr. X2r. and X4t) are assumed to have non-linear effects.
This model can be transformed into the following linear additive relation:
4
Yt = ao + L:aexet· (5.19)
l=l
xjt = eXIt
xit .jX2;
x3t = X3t
x.1't = lnx4t
and the x;1 are themselves, except for x)1, non-linear functions of the underlying
variables. Thus, from the point of view of estimation, equations (5.16) and (5.18) are
similar.
The proposed relation between each independent variable and the dependent vari-
able should be based on theory or experience. If we know that advertising shows
decreasing returns to scale we can focus our attention on possible mathematical for-
mulations. In addition, as we show in Chapter 7, model-building criteria can provide
direction with regard to the specification of a model.
We next discuss a few formulations, with their characteristics, advantages, and
disadvantages. Consider the following relation:
1000
500
300 X
X
200
100
Figure 5.3 Sales in units and advertising expenditures (2) (equation (5.20)).
Figure 5.3 suggests that (5.20) provides a better fit than the straight line shown in
Figure 5.2. would have. In Figure 5.3 the fitted sales will increase with advertising,
but at a decreasing rate, i.e. the increases in sales become smaller as a jr gets larger.
To understand the nature of effects, we can use the first derivative of qjr with respect
to a jr:
(5.21)
sales would decline with further increases in advertising. This phenomenon is known
as supersaturation (Hanssens et al., 1990, p.42). Supersaturation results if excessive
marketing effort causes a negative response. 22 If this phenomenon runs counter to our
prior beliefs, we could reject the model specified in equation (5.20). Nevertheless, the
model may perform well within a certain range of values of a jt. For example, in
Figure 5.3, ajr ranges between $175 and $1,600. Perhaps we should restrict the use
of the parameter estimates cio, ci1, and ci2 to this range of variation. Indeed, if we
22. Supersaturation, is perllaps more realistic if the advertising variable ait in (5.20) is replaced by personal
selling efforts. An excessive number of visits by sales persons may have a negative effect on sales.
ELEMENTS OF MODEL BUILDING 69
Figure 5.4 Example ofa semi-logarithmic relation (qjt = ao + a1lna jt, a1 > 0).
(5.22)
dqjt a1
--=-- (5.23)
dajt 2..jCiji
which shows decreasing returns to scale, and tends to zero when a jt is very large.
Another frequently used mathematical form is the semi-logarithmic one, i.e.,
(5.24)
where lnaj 1 is the natural logarithm of ajt· An example is shown in Figure 5.4. Now
for some values of aj 1 , predicted sales is negative:
23. The main reason for this is the fact that one never knows the "true" model. Experience and experimentation,
however, may help to reduce uncertainty.
70 CHAPTERS
ao - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ao + a1lnaj1 < 0, if
-ao
lnaj 1 <
a!
or
Thus, this model is not acceptable when this condition applies. However, equation
(5.24) shows decreasing returns to scale over the whole range of a jt, since:
dqjt = a1 (5.25)
dajt ajt
which decreases with a jt. Again, returns to advertising tend to zero as a jt becomes
very large.
The sales-advertising relations (5.20), (5.22), and (5.24) all represent decreasing re-
turns to scale. All three, however, are deficient for high values of advertising: the
first one (5.20) because for high values of aj 1 , qj 1 starts to decline; the second and
third ones, because qj 1 tends to infinity, when aj 1 tends to infinity. Since we know
that maximum sales potential is a finite quantity, we prefer sales-advertising models
ELEMENTS OF MODEL BUILDING 71
ao - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Pjt
= ao + -,
al WI'th ao, a1 > 0 (5.26b)
Pit
where
p jt = price of brand j in period t.
An illustration of equation (5.26b) is shown in Figure 5.6. It is clear that (5.26b) may
not be meaningful for extremely low values of Pjr: as PJt goes to zero, qjr goes
24. We analyze this problem further in Chapter 7. One of the implementation criteria concerns model behavior
for extreme values of the explanatory variables.
72 CHAPTERS
eao - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ln Rjr (5.27)
where
R jt revenues, obtained by a retailer, for product group j
in period t, and
rajr = number of products (e.g. sizes and brands) in product
group j in t (range).
In Figure 5.7, we see that this curve, for ao > 0 and a1 < 0, shows increasing re-
turns to scale for rajr < -aJ/2, and decreasing returns for rajr > -aJ/2. This is
demonstrated below. Relation (5.27) can also be written as:
R ]. I_- e<ao+ai(raJ,) . (5.28)
(5.29)
ELEMENTS OF MODEL BUILDING 73
(5.30)
It follows that the inflection point is ra jr = -at /2. Equation (5.28) has an asymptote
eao as rajr increases. 25
So far, we have focused on the issue of varying returns to scale. In Section 5.3 .1.
we indicated that a second deficiency of the model linear in parameters and variables
is that it does not allow for interactions. One way to overcome this deficiency is to
add interaction variables. For example, with two predictors Xt and x2 we can add the
product Xt • xz:
(5.31)
With a3 positive, (5.32) indicates that the marginal effect of Xtr increases with xz1•
For example, if Xt 1 is advertising, and x21 distribution, measured by the number of
retail stores carrying the brand, (5.32) allows for advertising to have a larger effect if
more stores sell the brand.
A disadvantage of the interaction term formulation becomes apparent when the
number of predictor variables exceeds two. For example, with three predictors, a full
interaction model becomes:
and
ayr
- - =IX! + IX4X21 + IX5X3t + IX7X2tX3t· (5.34)
axlt
25. This model has been used for non-frequently purchased consumer goods by Brown and Tucker (1961).
Bemmaor ( 1984) used this model in his study of an advertising threshold effect. See for another application
Leeftang ( 1975).
74 CHAPTERS
One of the most frequently encountered marketing response functions is the so-called
multiplicative model: 26
O!J 0!2 O!k
Yr = aoxlt x2t ... x Lt (5.35)
or more compactly: 27
nx;,t.
L
Yt = ao (5.36)
l=l
Response function (5.35) has the following desirable characteristics. First, it accounts
for a specific form of interaction between the various instruments. This can easily be
seen by looking at the first-order derivative with respect to, say, instrument xe:
(5.39)
Equation (5.39) is linear in the parameters a0, a1, a2, ... , aL, where a0(= lnao).28
Equation (5.39) is sometimes referred to as a double-logarithmic relation in contrast
to a semi-logarithmic one, such as (5.24), where logarithms only appear in the right-
hand side of the equation.
26. Also referred to as Cobb-Douglas response functions because the structure is identical to that of Cobb-
Douglas production functions, Q = aLPCY (Q =quantity, L =labor, C =capital).
27. nf=t is a product sign indicating that L terms from l = 1 to l = L will be multiplied. For example,
nf= 1xe is the compact notation for XJ · x2 ••.• , ·XL·
28. 1n fact we saw a similar example before. Model (5.28), is non-linear in the parameters ao and a,. Taking
logarithms, however, makes it linear (see equation (5.27)).
ELEMENTS OF MODEL BUILDING 75
With regard to interaction effect, shown in (5.37) we note that there are many ex-
amples of interaction effects in the marketing literature (Gatignon, 1993, Logman,
1995). Consider, for example the interaction between communication expenditures
and price. There are alternative theories, i.e. "the advertising as market power" and
"the advertising as information" schools of thought. According to the first school,
high communication expenditures (in particular advertising) allow firms to obtain
market power, in part because consumers become less price sensitive (Conamor,
Wilson, 1974, Krishnamurthi, Raj, 1985). However, supporters of the "advertising
as information" school argue that increased communication encourages brand com-
parisons which will increase price sensitivity (Prasad, Ring, 1976, Wittink, 1977 and
Kanetkar, Weinberg and Weiss, 1992)29 •
Many other interaction effects are discussed in Logman (1995). One of the rela-
tions which is quite often used to study interactions is a so-called "extended multi-
plicative relation" (Logman, 1995):
(5.40)
which is a combination of (5.39) and (5.31). Equation (5.40) allows the elasticity, of,
say, x 1, to depend on x2.
To see the functional form of relations in the multiplicative model, consider the case
with only one explanatory variable:
(5.41)
Figure 5.8 shows (5.41) for various values of a,. Curve I represents the case a1 > 1,
i.e. increasing returns to scale. Curve II is typical for 0 < a1 < 1, i.e. decreasing
returns to scale. This is what we might expect if XJ were advertising. Curve III il-
lustrates the case -1 < a1 < 0, and finally curve IY, a1 < -1. The latter two might
apply when x,is a price variable, curve III representing inelastic demand and curve
IV elastic.
Multiplicative demand functions have been used in empirical studies for a very
long time. 30 In empirical research in marketing of the econometric variety, it is one
of the most popular specifications.
There are many other linearizable forms besides the multiplicative model. One is
the exponential model:
(5.42)
which, after taking logarithms, becomes linear in the parameters Yo (= In ao) and a1,
i.e.:
29. For more thorough discussions of this "controversy" in interactions, see Popkowski-Leszczyc, Rao (1989)
and Kaul, Wittink ( 1995).
30. See, for example, Moore (1914) and Schultz (1938).
76 CHAPTERS
Yr
1:
cq >I
------II:
O<aq<l
1.0 XJr
The exponential model is represented in Figure 5.9. This model may, also, with a,
negative, be appropriate for a sales-price relation. For price (xlt) equal to zero, sales
equal ao, whereas for price going to infinity, sales tend to zero. 31 However, for a1 >
0, (5.42) has no saturation level. Yet for almost all products in virtually any market
it holds that no matter how much marketing effort is expended, there is a finite upper
limit to sales. We show later how the saturation level is a finite quantity in the modified
exponential model (5.50), which is an intrinsically non-linear model.
Another popular specification in empirical research in marketing is the logit model,
which is discussed in detail in Chapters 9 and 12. A specific version of a logit model
is the logistic specification (5.44): 32
1
Yt = L · (5.44)
1 + exp -(ao + Lt=l aexer)
This relation can be linearized and rewritten as:
L
ln ( 1 ~ ) = ao +L aexet. (5.45)
Yt l=l
31. Cowling and Cubbin ( 1971) used this functional form to explain the United Kingdom market for cars in
tenns of quality-adjusted price.
32. See Nooteboom ( 1989) for an application.
ELEMENTS OF MODEL BUILDING 77
aq < 0
XJt
We can also use the Gompertz model, which has a S-shaped form:
(5.46)
with ao :;> 0, and 0 < a1, az < 1. Equation (5.46) is shown in Figure 5.10. For XJt
going to zero, y1 approaches ao ·a 1 asymptotically. When x It tends to infinity, y 1 goes
to ao. By taking logarithms, (5.46) becomes:
(5.47)
(5.48)
which corresponds to ao being equal to one in (5.46). This could be appropriate for a
market share model, where one must be the upper limit. With ao =
1, (5.47) becomes:
X!r 1
1ny1 = a2 na1.
In 1n Yt = /h + /hX!t
33. See, for example, Montgomery and Urban (1969, p. 340), wherex1 1 = t.
78 CHAPTERS
Yr
where
fh = lnlna1, and
fh lna2.
In the SCAN*PRO model, developed by Wittink et al. (1988) which is discussed in
Section 9.3, we introduce so-called multipliers. These are parameters with dummy
variables as exponents. The partial relation between the dependent variable (here
sales) and the dummy variables can be written as:
Yt = aoyfer (5.49)
where
Yl multiplier,
8er = an indicator variable for promotion e : 1 if the brand is
promoted in t, and 0 otherwise.
A multiplier value of 2 for, for example, l = 1 = own display activity, means a
doubling of brand sales when the brand is on display. Relation (5.49) is a special
case of(5.46).
ELEMENTS OF MODEL BUILDING 79
Yt
ao - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
X it
It should be clear by now that marketing relations are generally non-linear in either
variables or parameters or both. These relations sometimes are linearizable while
at other times they are not. In the latter case the model is called intrinsically non-
linear or intractable. 34 In the past, model builders often went to great efforts to make
their models linearizable. This was primarily due to the fact that estimation methods
in econometrics generally assumed models to be linear in the parameters. In recent
years, however, powerful non-linear estimation techniques have been developed, and
their availability as computer routines is increasing. Such techniques are discussed in
Goldfeld and Quandt (1972, 1976), Amemiya (1983, 1985) and Judge et al. (1985).
Other researchers have adapted non-linear programming algorithms for non-linear
estimation. 35 Thus, from an estimation point of view, intrinsic non-linearity is no
longer problematic in a purely technical sense. It remains true, however, that the
statistical properties of non-linear estimation techniques are not as well known as
those of linear models. We return to this point in Chapter 16.
As a first example of an intrinsically non-linear model, consider the modified
exponential model:
Yt
O!Q - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Equation (5.50) is shown in Figure 5.11. If Xit equals zero, y 1 also equals zero. As
x 11 goes to infinity, y1 approaches ao asymptotically. An interesting characteristic of
the model is that the marginal sales response is proportional to the level of untapped
potential (ao - y1 ). This is easily demonstrated as follows. The first-order derivative
of y1 with respect to X it is:
dyt -O!JX!
(5.51)
- - = aoa1e 1
dXit
Untapped potential is:
and it follows from (5.51) that the marginal sales response is proportional to (ao- y1 ),
with a1 serving as the proportionality factor.
Applications of(S.SO) are Buzzell (1964, pp. 136-156), and Lodish, Montgomery
and Webster (1968). In their models x 11 represents selling effort. In Little and Lodish
( 1969), it is advertising effort. Rangan ( 1987) developed a more general (multivariate)
modified exponential model:
(5.52)
The dependent variable (y1 ) in Rangan's model is market share at timet. The par-
ameter ao is the maximum achievable market share with channel effects alone. The
xe 1 are channel function components such as sales force calling effort, inventory level,
delivery time, etc.
ELEMENTS OF MODEL BUILDING 81
fJ ------------------------------------------- -----
a I-----
shown in Figure 5.12. This is a more general specification than (5.44) in which the
saturation level ao = l. With XJr = 0, y1 = aof(l + e-a 1), and with XJt going to
infinity, y1 tends to ao.
In the literature, a number of other S-shaped models have been proposed. We
restrict ourselves to just a few. The first is one proposed by Little (1970) in his
advertising budgeting model ADBUDG. The dependent variable is market share of
brand j, m jt. and the explanatory variable is advertising expenditures ofbrand j, a jt.
The following market share response function is postulated:
8
ajr
m 11 =a+ ({3- a ) - -8- . (5.54)
y +a}t
Assuming that all parameters are positive and f3 > a, we obtain the graphical repre-
sentations shown in Figures 5.13 and 5.14 for 8 > 1 and 8 < 1 respectively. 36 · 37
36. The difference is easily understood by looking at the first- and second-order derivatives:
om jr/oajt ("-
p
8- 1 j(y
a)y8a ]I + a]f8 )2 ' and
37. We return to this particular market share response function in Sections 7.5 and 16.9.
82 CHAPTERS
{J ----------------------------------
where, aj, = ajr/(ajt + ac1 ), and with act the advertising spending level of com-
petition. In (5.55) Ot represents the lower and f3 the upper limit of market share.
Specifications (5.54) and (5.55) are closely related as is demonstrated below. From
(5.54) we have:
(5.56)
and
Replacing 1/y by 1/ and ajt by aj1 we obtain (5.55). The extension to other variables
is straightforward, and in its most general fonn (5.55) can be written as:
(5.58)
ELEMENTS OF MODEL BUILDING 83
Marketing dynamics
Marketing is in essence dynamic. This means that new products are developed, tested,
and launched, that more and better packaging is introduced, that competitors come
and go, and so on. But it also means that the effect of, for example, an advertising
campaign does not end when the campaign is over. The effect, or part of it, will
remain perceptible for some future periods. Or, looked at somewhat differently, sales
in period t will be determined by advertising in t, but by spending in t -1, t- 2, ... as
well. Thus, one can refer to the lagged effects of advertising. For trade and consumer
promotions the same effects may occur. 1 The dynamic aspects of marketing are not
only exhibited in lagged effects but also in lead effects, i.e. consumers or competitors
anticipate a marketing stimulus and adjust their behavior before the stimulus actually
occurs. 2 So, for example, consumers may anticipate future price cuts or promotions.
In this chapter we first discuss the modeling of marketing dynamics in the case of
one explanatory variable. Issues related to having more than one independent variable
are dealt with in 6.2. In Section 6.3 we discuss the selection of dynamic models.
Finally in Section 6.4 we give attention to lead effects.
Marketing dynamics in the one variable case often deals with the time effect of ad-
vertising. Most of the discussion in this section will, therefore, be concerned with
advertising. As we indicate in Section 2.3, the effect of a campaign in period t will
generally not be exhausted by the end ofthe period, but will linger on into the future.
That means that the effect is cumulated over time, and thus one refers to the cumu-
lative effects of advertising. Cumulative effects of advertising can be represented, for
example, by the demand function (6.1 ):
(6.1)
where
I. See, e.g. Blattberg, Neslin ( 1989, 1990, 1993), Blattberg, Briesch, Fox (1995), vao Heerde, Leeflang, Wittink
(1999c).
2. This description is based on Haossens, Parsons aod Schultz ( 1990, p. 213).
85
86 CHAPTER6
sales in period t,
a1 , a1 _,, •• • = advertising expenditures in period t, t - 1, ...
respectively, and
u1 = a disturbance term.
Equation (6.1) indicates that advertising has an effect up to s periods into the fu-
ture. The lagged effect of marketing activities on sales has long been recognized as
one of the complicating features of market measurement. Its existence means that
promotional programs should be evaluated over a period longer than that of each
campaign. Failure to measure it adequately leads to under-estimation of the impact
of the promotions in media concerned (Jones, 1986}, and is one probable reason
for the typical switch of marketing expenditures from the long-term brand building
offered by advertising, to short-term oriented, but easily measured, sales promotions
(Magrath, 1988).
We make a distinction between delayed-response effects and customer-holdover
effects. Delayed-response effects arise from the delay between marketing money
being spent and sales occurring. 3 It can happen because of:
• execution delay: e.g. the time between management spending money or preparing
an ad and the ad appearing;
• noting delay: e.g. the time between a magazine being published and read;
• purchase delay: the time between a consumer recieving the stimulus to purchase
and a purchase being made.
(6.2)
where s is the number of time periods between, say, the time the advertising money
is spent and the sales that result from this expenditure.
Customer-holdover effects occur because customers sometimes make repeated
purchases for some time, t, t + 1, t + 2, ... , t + s after the initial stimulus in t, either
as:
• new buyer holdover effects, where marketing activity creates new customers who
make repeat purchases, or
• increased purchase holdover effects, where the marketing stimulus increases the
average quantity purchased per period for some time.
The estimation of lagged effects in the demand function (6.1) is difficult for at least
three reasons. First, one has to decide how many lagged terms to include in the
specification, i.e. one has to decide how long the advertising effect lasts. In what
follows, we refer to this as the duration interval, and determining the correct value
of s is called the truncation problem. Secondly, with s lagged terms in the model,
3. See, Lilien, Kotler (1983, p. 80), and Leefiang, Mijatovic, Saunders (1992).
MARKETING DYNAMICS 87
the number of parameters becomes s + 2. Since data are often a scarce commodity,
statistical problems are likely to arise because of loss of degrees offreedom. 4 This
results both from an increase in the number of parameters and from a decrease in
the number of usable observations. Suppose we have thirty-six monthly observations
(T = 36) on sales and advertising spending from January 1995 to December 1997.
For 1/95, January 1995, and assuming s =
10, (6.1) becomes:
This is not a usable observation, since al2/94 •.•. , a3/94 are unknown. The first usable
observation is then 11/95, November 1995, for which (6.1) becomes:
In general, with s the number of lags, the number of usable observations is T -s, and
the number of degrees of freedom of the error term is T - 2 - 2s. A third difficulty
with direct specification of lags is that the larger s, the higher the chance that the
explanatory variables become collinear. 5
The three reasons above have led researchers to consider relations between the
various {3; in (6.1) in order to arrive at a simpler model. A simpler model will have
fewer parameters and will be less troublesome to estimate. One should, of course, be
careful not to loose the true structure for simplification.
Many alternative specifications of a relation between the {3; 's have been proposed
in literature. We consider the most common ones. 6 The most popular lag structure in
marketing studies is no doubt the following. First (6.1) is rewritten with an infinite
number of terms, which means that the entire advertising history is taken into account:
00
= A,
A, or
= A2 ' (6.4)
4. In Section 5.2 we saw that the typical objective function in estimating parameters is to minimize "2:}= 1 u"f.
If the number of observations is exactly equal to the number of parameters, an exact fit can be obtained and,
ur
therefore, L."{"= 1 = 0. Thus to obtain information on the nature of the error term of(6.1), we need more than
T - s - 2 observations. One refers to T - s - 2 as the number of degrees of freedom for the error term. The
larger s, the smaller this number of degrees of freedom becomes. See also Chapter 16 on parameterization.
5. Collinearity or multicollinearity relates to the correlation between explanatory variables. A more exact def-
inition will be given in Chapter 16. We limit ourselves here to stating that collinearity has the disadvantage of
making the coefficients less reliable.
6. For extensive surveys, we refer to Griliches ( 1967) and Dhrymes ( 1981 ), Judge, Griffiths, Hill, Liitkepohl
and Lee (1985, Chapters 9 and 10) at the more theoretical level, and Yon (1976, Chapter 4), Hanssens et al.
(1990, Chapter 7), Leeflang et al. (1992) in a marketing context.
88 CHAPTER6
where, 0 :::; 'A < 1. The lag structure specified in (6.4) assumes that the advertising
effect is geometrically (or exponentially in the continuous case) decreasing over time.
This model is known as the geometric lag model or Koyck model.
In the literature on lag structures, one often encounters the term distributed lags
or lag distribution. The meaning of this terminology relates to the similarity in the
properties of lag coefficients and those of a probability mass function, as is shown
below. In general, we can rewrite (6.3) as:
00
where the we terms represent the lag structure. Distributed lags are then assumed to
have the following properties:
L:we = (6.7)
£=0
i.e. the lag coefficients are assumed to be non-negative and to sum to one. 7
Since the lag structure defined in (6.4) implies Wt = 'A e, it satisfies (6.6) but not
(6. 7). In order for the geometric lags to sum to one they should be defined as:
(6.8)
The lag structures as defined in (6.4) and (6.8) are, of course, not fundamentally
different. We will therefore adhere to the former, because in the literature it is most
frequently applied under that form. Substituting (6.4) in (6.3) we obtain:
qt = ot + fhat + fh'Aat-1 + fh'A 2ar-2 + ... + f3t'A Lat-L + ... + Ut. (6.9)
Now we lag equation (6.9) one period and multiply by 'A:
7. The requirement of non-negativity is not always realistic. Sales in a period after a promotion may be below
the normal level, which implies that the corresponding parameter of the Jagged variable is negative.
MARKETING DYNAMICS 89
90% duration
interval in Number of
Data interval i months studies
Thus fh I (l- A.) measures the (total) effect of advertising. Given (6.4 ), the parameters
f3J and A. suffice to measure the short-term and long-term effect of advertising. 8
The Koyck model, however, is not without problems. First if u1 satisfies the
assumptions of the classical linear regression model, then u;
does not. 9 We do not
discuss this issue here, but refer to any standard econometrics text.
The duration interval of advertising implied by the Koyck model has been studied
by Clarke (1976) on the basis of a survey of 59 cases.Since (6.11) is an additive model,
the same relation holds for, say, q( = q1 +q1-J and a(= a1 +a1-J. Hence the implied
duration interval should not vary with the periodicity of the data, expressed by the data
interval. However, from the figures in Table 6.1 it is clear that the implied duration
interval is a function of the periodicity of the data.
8. In an applied setting, estimated values &, Pi•and j_ will be used. These estimates are obtained from the
numerical specification of (6.11). Since Pi and j_ are estimated,Pi /(I - j_) will be estimated as well. One
difficulty is that the distributional properties of that ratio are not well known.
9. In particular, if the residuals u1 of the original model are uncorrelated, then the uj must be autocorrelated.
90 CHAPTER6
Table 6.2 Parameter estimates and implied duration intervals for cigarettes
as a jUnction of the data interval.
The table shows the average value of~. and the average 90% duration interval, which
means the time it takes for advertising to reach 90 percent of its total effect, for each
of the five data intervals. The results indicate a large increase in the implied duration
time as the data interval increases, pointing to a data interval bias. The problem is even
more striking when very different implied duration intervals are obtained from one
and the same data set. A case in point is the relation between industry sales and indus-
try advertising expenditures for the West-German cigarette market. The relation has
been estimated by Leeflang and Reuyl (l985a) using annual, bimonthly and monthly
data covering the same time period ( 1960-1975). Table 6.2 shows the estimated values
of the advertising parameter and the parameter for lagged primary demand of the
calibrated Koyck-model. Furthermore the long-term elasticity of advertising and the
90% duration interval in months are specified.
Leone ( 1995) provides a theoretical explanation for the inconsistent findings from
previous econometric analyses of aggregated data concerning the duration of adverti-
sing carry-over effects. He also adjusted the lagged sales parameter in models with a
data intervaVaggregation bias. He found that the average carry-over effect is between
six and nine months. A similar conclusion is based on a meta-analysis performed by
Assmus, Farley, Lehmann (1984) and it is also consistent with the findings ofLodish
et al. ( 1995b). They summarized advertising carryover effects based on a summary of
55 in-market estimates of the long-term effects of TV advertising using experimental
(Behavior Scan) tests. We return to this issue in Section 14.1.3.
We now tum our attention to the shape of the lagged effects. A geometrically de-
caying lag structure implies that a campaign in period t has its greatest effect in the
same period. This may or may not be realistic, depending, among other things, on
the periodicity of the data. For example, decreasing lagged effects might be quite
MARKETING DYNAMICS 91
reasonable for annual or even for quarterly data. With monthly data, it is possible that
advertising will reach its peak effect after a few periods. For example, in their study
of the dynamic effects of a communications mix for an ethical drug, Montgomery and
Silk (1972) found that direct mail had its peak effect in the month after the mailing.
Samples and literature similarly peaked after a one-month lag. Journal advertising
showed a peak, although a modest one, three months after the advertising appeared
in the medical journals.
There are various ways of dealing with such more complex lag structures. The
most obvious way is to include a number of direct lags, the geometric decay taking
effect after a few periods. In that case, the sales-advertising equation becomes, for
example:
qt = a+ fhar + fhar-1 + fhar-2 + f34ar-3 + f34A.ar-4 (6.13)
+f34A. 2ar-5 + ... + Ut.
Applying the Koyck transformation to (6.14), we obtain after rearranging terms:
qt = a(1 -A.)+ A.qr-1 + fhar + (fh- f3IA.)ar-I (6.14)
+ (/3) - f32A.)ar-2 + (f34- f33A.)ar-3 + Ur - A.ur-1
and the relation to be estimated is:
qt = a*+ A.qt-1 + f3Iar + f3;ar-I + f33at-2 + f3tar-3 + u~ (6.15)
where
a* = a(l -A.),
f3t = f3i - f3i-l . )., i = 2, 3, 4,
u*t Ut- AUt-1·
Equation (6.15) contains six parameters. Their estimates suffice to obtain estimates
of the parameters in the original model (6.14). Although this formulation allows for
more flexibility in the nature of lagged effects, it reintroduces some of the difficulties
(loss of degrees of freedom and multicollinearity) which we want to avoid.
The combination of direct lags with geometrically declining lags has been applied
by a number of authors. Lambin (1972a) uses it in relating sales to advertising of
gasoline based on quarterly data. Similar approaches are used by Montgomery and
Silk (1972), Doyle and Saunders (1985), Leefiang et al. (1992).
Bass and Clarke (1972) examine a number of alternative specifications in their
study of the advertising effectiveness for a dietary product. Their point of departure
is equation (6.5), for which six alternative specifications of the we are considered. In
Model I, we = (1 - ).)).e, resulting after transformation in:
q1 = a(1 -A.)+ A.q1-! + f3I (1 - A.)a1 + Ut -AUt-!· (6.16)
In Model II one specific lag is introduced before the geometric decay function be-
comes applicable, in which case we obtain: 10
(6.17)
10. For a more detailed exposition on, among other things, the constraints on the parameters to ensure non-
negativity of the lag coefficients, we refer to the Bass and Clarke article. A full understanding requires knowledge
of lag-generating functions and the algebra of lag operators, for which we refer to Dhrymes ( 1981 ).
92 CHAPTER6
A natural extension is to have one direct lag in the explanatory variable a1_1 in
addition to the two lagged dependent variables. The result is Model IV:
0 2 3 4 5
Figure 6.2 Pascal distributed lag structure (r = 2, >.. = 0.8).
The Pascal lag structure offers opportunities to estimate a model that contains an
infinite number of parameters. The Pascal lag was proposed and applied by Solow
(1960). The Pascal distribution, also called the negative binomial distribution, has the
following structure:
(r + l - l)! r e
we = 1 (1 -A.) A. , .e = 0, 1, 2, .... (6.22)
(r - 1).£!
wo (1- A.)2
WI = 2(1 - A.) 2A.
W2 = 3(1 - A.)2A. 2
W3 4(1 - A.)2A.3
objective was not to obtain a best estimate of the lag structure, but to examine whether
or not lag structure matters in optimizing advertising spending. For that purpose,
optimal values were derived corresponding to a number of alternative lag structures
(no dynamic effect at all, r = 1, r = 2, ... , r = 6). The sensitivity of profit was
examined with respect to using one model to estimate the parameters (for example,
the static model), when in fact another model (for example, r = 6) is the correctly
specified one. They concluded that, at least for this data base, lag structure does not
matter much in optimization, the implication being that the emphasis in the literature
on finding the best lag structure may be somewhat misplaced. 13
Almon ( 1965) proposed a method to estimate a model with finite distributed
lags. The relation between the ~~ 's in (6.1) and the lag length is approximated by
a continuous function of the form:
(6.23)
This equation is a polynomial in t*, and if r is strictly less than s, the use of this
approximation imposes restrictions on ~t, t* = 1, ... , s (Stewart, 1991, p.l81 ). The
parameters ¢i may be estimated by substituting (6.23) into (6.1) and applying least
squares. This model has been widely used in applied econometric work because of the
flexibility of the polynomial lag shape, the decrease in the number of parameters that
must be estimated, the ease of estimation, and the reduction in multicollinearity. In
marketing this model has been applied by van Heerde et al. ( 1999c) to model dynamic
effects of promotions. 14
In the marketing literature several other distributed lag models have been devel-
oped and applied. We discuss two of them. The first model is based on the current-
effects model:
(6.24)
In the current-effects model it is assumed that the disturbances u 1 are not correlated
with the disturbances in preceding periods. That is: there is no autocorrelation. If,
however, the u 1 are correlated with, for example, Ut-1 we have:
which is a linear relationship between a change in the level of demand and a change
in the level of advertising expenditure. If there is no autocorrelation and p ~ 0,
(6.27) reduces to (6.24). However if p =f. 0 we have a dynamic specification. This
autoregressive current-effects model (6.27) is often used by researchers (Kanetkar,
Weinberg, Weiss, 1986).
Another well-known dynamic model is the partial adjustment model. This model
is based on the notion that there are levels of marketing instruments which are asso-
ciated with desired sales q;:
(6.28)
while the actual sales achieved only adjusts partially toward the desired sales amount
(Nerlove, 1956):
(6.29)
where
This phenomenon may occur if, say, repetitions of a television ad are needed before
all members of the target market comprehend it or repetitions are needed to convince
some consumers to buy.
By substituting (6.28) into (6.29), the partial adjustment model is obtained:
(1967), Bass, Clarke (1972) and Weiss, Windal (1980). We return to these tests in
Section 6.3 and Section 18.4.2.
To conclude this section, we return to (6.6), i.e. we :=:: 0 for all£, introduced when we
discussed the meaning of distributed lags. We said in passing that non-negativity is
not always a realistic requirement. An example is the dynamic effect of a cents-off
(price cut) promotion. During a promotion for a frequently purchased food product,
sales tend to increase above the normal sales level. It is possible that sales in the week
after a promotion will be below the normallevel. 15 An explanation is the result that
a promotion may not only attract new buyers, but it may also result in loyal buyers
stocking up on the product, which in tum leads to a decrease in sales in the period
immediately following a promotion. The dynamics of a promotion, pr1 (excluding all
other variables for simplicity), could therefore be modeled as follows:
(6.31)
where
This phenomenon is known as a negative lag effect (Doyle, Saunders, 1985). The
distributed lags for which we :=:: 0, are known as positive lag effects.
This may seem quite plausible. It implies, however, that price and distribution have
the same lagged effect as advertising. This means that (6.32) is obtained by applying
the Koyck transformation to:
geometrically declining lag structure, but with different parameters AI and A2. The
basic model becomes:
The model transformation now requires two steps. First, equation (6.34) is lagged one
period, multiplied by AI and subtracted from (6.34), yielding:
Now (6.35) is lagged one period, multiplied by A2, and subtracted from (6.35). After
rearranging terms, one obtains:
We observe that (6.36) is not only much more complex than (6.32), but is also nonlin-
ear in the parameters. The unknown parameters A1 and A2 appear both in the relation
and in the expression for the disturbance terms (u;>. There are different solutions for
this problem:
From the discussion at the end of Section 6.1 it appears that there are subtle dif-
ferences between alternative dynamic specifications. The best among the alternative
specifications can be chosen in different ways. We limit ourselves to highlighting a
few points. 16
16. A more detailed discussion on choosing the best model is given in Chapter 18.
98 CHAPTER6
First, the models can be evaluated on theoretical criteria. For example, if the
theory states that all lagged effects should be non-negative, their estimates will have
to satisfy a number of constraints. In relation (6.18), i.e. Model III of Bass and Clarke
(1972), for example, imposing non-negativity constraints implies that the estimated
parameters should satisfy the set of constraints (6.37): 17
t3t"ao ~ 0.
Bass and Clarke ( 1972) obtained the following estimates:
and it is easy to see that the constraints are not satisfied, since 5.iis smaller than
-45.2. This suggests that the model should be rejected, in particular if, in addition, it
can be demonstrated that the violation of the constraint is not due to chance.
The selection of models is also possible based on statistical tests. Such tests are
easily applied if the models are "nested". Nesting is a means of selecting amongst
alternative specifications where the parameters oflower-order equations are contained
within the parameters ofhigher-order ones. For example, Model V of the Bass, Clarke
model, i.e. (6.20), is nested in Model VI (6.21); Model IV is a specific version of
Model VI where A3 = 0; Model II is nested in Model IV; etc. Not all "competing"
models are nested. Although it can be demonstrated 18 that the Koyck model (6.11),
the current-effects model (6.24), the autoregressive current-effects model (6.27) and
the partial-adjustment model (6.30) are all nested into one master model (the autore-
gressive multiple geometric lag model) they are not mutually nested. The selection
between non-nested models is possible using information criteria and specific tests
developed for non-nested models: see Section 18.4.3.
(6.40)
where
Pt price per unit in period t, and
Pt+ 1, Pt+2 = the announced prices or expected prices one- and two
periods ahead, respectively.
If consumers expect price increases in t + 1 and t + 2, fh, fh > 0, i.e.: positive leads.
Anticipation of price reductions result in negative leads, i.e.: fh, /33 < 0. Negative
leads were found by Doyle, Saunders ( 1985) and van Heerde et al. ( 1999c).
CHAPTER 7
7.1 Introduction
Several authors have proposed specification criteria, i.e. criteria pertaining to model
structure. Most authors have looked at these from the model builders point of view.
Some criteria are based on theoretical requirements such as the logic of the mathe-
matical structure. Other criteria may be practical, and emphasize aspects relevant to
parameter estimation. A few authors have considered criteria from the user's point of
view.
In proposing criteria for a decision calculus, Little ( 1970) takes the user's point
of view. As indicated in Chapter 1, the criteria belong to two aspects: model structure
and ease of use. With regard to model structure, models should be:
1. simple;
2. complete;
3. adaptive;
4. robust.
To link the criteria "simple" and "complete", Urban and Karash ( 1971) introduced the
notion of evolutionary model-building. This criterion is also added by Little (1975a)
in his later work. The evolutionary character of model-building is not in fact a cri-
terion related to structure, but to the implementation process. We return to this topic
101
102 CHAPTER 7
All models are simplified representations of real-world phenomena. One way in which
model simplicity can be achieved is by keeping the number of variables small, and
only to include the important phenomena in the model. This can be achieved in one
or more of the following ways:
a. Clustering of variables.
Clustering of variables, which is often done in econometric studies. For example,
a large number of brands assumed to influence the performance of the brand under
study, may be aggregated into "competition". Or marketing instruments are aggre-
gated into a small number of classes, such as product, distribution, advertising, pro-
motions, and price. 1 For example, advertising expenditures separated by media such
as television, radio, newspapers, magazines and billboards would be aggregated into
total advertising expenditures. We note that this aggregation implicitly assumes that
the marginal effect of an extra investment in advertising does not differ across the
media. If the data are sufficient, this assumption should be tested. Statistical proce-
dures such as principal components analysis or other dimension reduction methods
are sometimes used to group variables on the basis of their covariance structure.
Other examples are models in which the impact of marketing variables on ( 1) category
purchase (product class sales) (2) brand choice and (3) purchase quantity decisions
of households for frequently purchased goods are determined. 3
Points a-e above represent different ways of obtaining simple structure, often through
a reduction in the number of variables/parameters. This relates to the concern of many
model builders, especially those with a background in statistics and econometrics, that
models must be manageable and estimable. This calls for parsimony of the models,
i.e. there should be a modest number of parameters, and for simple structure, which
might mean that linear or linearizable models are preferred to non-linearizable ones.
The notions of simplicity favored by the model builder will not always be agree-
able to the user. Consider a producer of a frequently purchased food product. One
of her marketing mix elements is a temporary price cut or price promotion offered
multiple times per year. The brand is supported by a limited amount of advertising
throughout the year. During the times of promotion, however, advertising support is
increased, primarily to make consumers aware of the price cut. If the model builder
wants to estimate separate effects of promotion and advertising, he is likely to ex-
perience difficulties because these effects are confounded, since heavy advertising
spending coincides with price cut campaigns. Thus he may have to combine the two
variables and measure their joint effect. This, however. may not be acceptable to the
model user. For her, promotion and advertising are separate instruments, even though
in some periods, one might be used to support or complement the other. Combining
them for estimation purposes may result in a loss of quality and prestige ofboth model
and model builder in the eyes of the user. 4
In such a situation, the model builder may have to educate the model user about
some statistical difficulties and how those difficulties can be reduced. If a manager
has perfect knowledge of consumer response to marketing activities, she has no need
for the results obtainable with a model. All managers in fact have uncertainty about
market response. The question is whether they are willing to admit the extent of their
uncertainty. Some of this uncertainty can be reduced by analyzing historical data.
However, the actions taken by managers reflect their beliefs about market response.
Thus, if advertising support is always increased in the presence of a temporary price
cut, the manager apparently believes that the effect of the price cut is sufficiently
greater with this support. The only way to determine how beneficial this support truly
is, is to vary it systematically. For example, the manager could have some price cuts
not coincide with extra advertising, and implement a given price cut with a different
amount of advertising support (at different times and/or in different markets). By
experimenting with the marketing variables it is possible to enhance the learning
about market response (see Little, 1970). Another possibility is to combine subjective
with data-based estimation.
We now define more clearly what "simple" means for the user. We cannot expect
managers to be experts in mathematics, statistics, econometrics, operations research
and computer science. They are not, they do not pretend to be, nor do they want to
be. The manager is often not interested in the detailed intricacies of the model. What
she wants is a basic understanding of the logic of the model and of what it can do
for her. For the user, a model will be simple if this basic understanding is provided.
Communication and involvement are two means of achieving this.
Historically, communication between model builder and model user was almost
non-existent. This was described in 1969 by Montgomery and Urban (p. 5). Decision
makers did not understand the jargon of the operations researcher and management
scientist, and model building took place to a large extent in isolation. The situation
has much improved in recent years. Market research firms, such as AC Nielsen and
IRI, and consulting firms, such as McKinsey, have become heavily involved in the
development and estimation of market response models. Much of the development is
based on client needs, and this requires that implementation of model results plays a
large role. Thus, the model builders have to take into account how new models fit into
the decision-making environment. Models have also become easier to communicate
with as a result of the widespread availability of on-line computer systems and the
development of knowledge-based systems (McCann, Gallager, 1990). We treat this in
more detail in the section dealing with ease-of-use criteria in Chapter 19.
Involvement means that the important factors bearing on the problem are de-
scribed by the decision makers and not by the model builder. Also the model structure
should represent the decision makers' view of how the market works.
We should, of course, realize that the real world is not simple, and that when a
model represents the most important elements of a system, it will often look uncom-
fortably complicated. It is for this reason that Urban and Karash (1971) suggested
building models in an evolutionary way, i.e. starting simply and expanding in detail
as time goes on.5
The basic idea is that one does not build a model with all ramifications from the
start. Manager and model builder begin by defining the important elements of the
problem, and how these elements are related. Based on an initial meeting or set of
meetings, the primary elements are specified. The manager should be fully involved,
so that she is likely to understand what the model does, and is interested in this tool,
because it should represent her view of the world. As the manager uses the model,
and builds up experience with this decision aid, she will realize its shortcomings.
The model will then be expanded to incorporate additional elements. The model is
now becoming more complex, but the manager still understands it, because it is her
realization that something was missing which led to the increase in complexity. In
a sense the model becomes difficult, yet by using an evolutionary approach, it also
remains simple because the manager has a clear understanding of what the model is
supposed to do.
This approach can be accompanied by a two-step presentation. First, a formal-
ized or conceptual model is presented to manageme1;1t. This model ultimately reflects
management's own views about the nature of market response. Second, an empirical
model or statistical model is used to convey to management, how much of the overall
response is captured in the model (Hanssens, Parsons, Schultz, 1990, p. 336).
For a model to be a useful decision-support tool, it has to represent all relevant el-
ements of the problem being studied. This means that a model should account for
all important variables. If competitors matter, then the effects of their actions on a
5. Amstutz (1970) proposes something which is conceptually similar.
106 CHAPTER 7
brand under study should be incorporated. The marketing dynamics should be built in,
and so on. 6 Other relevant elements are parameters which account for heterogeneity
of the supplier (e.g. competitive advertising effects vary across competitors) and/or
the heterogeneity of demand (e.g. own-brand advertising effects, brand loyalty, price
sensitivity, etc. vary by market segment).
It should be clear that completeness on all important issues is a criterion which
may conflict with simplicity. As long as simple is roughly synonymous with under-
standable, then the model builder can resolve the conflict at least partially by adopting
an evolutionary approach.
There may be other conflicts between the interests of the model user and the
needs of the model builder. Suppose, for example, that the regular price is not an
active marketing instrument, a typical situation in many oligopolistic markets. It will
then be difficult, or impossible, to assess its impact on sales because it does not show
sufficient variation. As a result, the regular price does not appear in the specification
of an econometric model. In that case, the implication is not that price does not affect
sales, but that its effect cannot be measured by analyzing historical data. To have a
"complete" model of demand, the effect of price has to be assessed through other
means. 7
Completeness is, of course, a relative concept. It is relative to the problem, to the
organization, and to the user. We briefly outline these three aspects here and discuss
the first more extensively in Section 7.5 while we expand on the other two in Chapter
19.
Completeness relative to the problem can be illustrated as follows. In modeling
the effects of advertising, we may wonder whether we should focus on total adverti-
sing, i.e. estimate its overall effectiveness, or whether we should differentiate between
the various media vehicles available for the communication of advertising messages.
An initial answer to this question is that it depends on the problem definition. If the
model is intended to aid marketing mix planning at the brand management level, then
the total effect of advertising is what is needed. In that case it may be appropriate to
use the aggregate of all advertising expenditures. However, if the advertising manager
wants to know how to allocate expenditures to the different media vehicles, data on
each of these media are required. The advertising manager needs detailed understand-
ing of the effects of alternative media vehicles as well as of different advertising
copies and specific advertising campaigns (e.g. Montgomery, Silk, 1972).
This delineation of data needs corresponding to the needs of different managers
may, however, not be sufficient. Even if the marketing manager is only concerned
with the determination of total advertising expenditures, and not its breakdown, it
is possible that the estimation of the effect of total advertising is estimated with
6. See also Leeflang (1974, Chapter 6), and Leeftang, Koerts (1975).
7. A simple way to combine a laboratory experiment to determine price elasticity with historical data analysis of
other marketing instruments is proposed by Naert (1972). Other (field) experiments for this purpose are Gabor-
Granger procedures and Brand-Price Trade-Off analyses. See, e.g. Leeflang, Wedel (1993), Kalyanam, Shively
( 1998), Wedel, Leeflang ( 1998). Survey experiments, such as provided by conjoint analysis, offer additional
opportunities. See, for example, Currim, Weinberg, Wittink (1981), Mahajan, Green, Goldberg (19!12).
IMPLEMENTATION CRITERIA 107
greater validity and precision from a model in which the advertising components are
separated.
The organizational structure of the firm will also be a determinant of complete-
ness. A firm with a highly decentralized structure may generate a large number of
relatively simple subproblems. The corresponding models will also be relatively sim-
ple and incomplete from the perspective of the firm. 8 Centralized firms will have
a smaller number of separate problem statements each of which will require more
complex models.
At the same time, the size of the organization may influence the desired degree
of completeness. A marketing mix problem for a small firm operating at the regional
level will not be the same as that of a firm selling a national brand. We consider this
aspect in the discussion of costs and benefits in Chapter 20.
The desired level of completeness will also depend on the user. Larreche (1974,
1975) has observed that one manager's integrative complexity, i.e. her ability to in-
tegrate pieces of information into an organized pattern, is not the same as that of
another manager. The desired level of complexity and completeness will, therefore,
vary according to the user. This amplifies the need for intense involvement of the user
in the model-building process.
Market change and market behavior are dynamic. Thus, it is not possible to think of
model building as a one-time affair. Instead, models need to be adapted more or less
continuously. This implies that either the structure and/or the parameters have to be
adapted. For example, the entry or exit of a competitor may imply that certain model
parameters change. In addition, the specification of the model may have to change.
The changes that require model adaptation can take many forms. The true values
of model parameters can change if the set of consumers that makes product category
purchases changes. We discuss this issue in detail in Section 17.4. Brand parameters
may also change if the amount of advertising for all brands or the average price in
the category changes. And modifications in product characteristics, which are rarely
included in demand models, can change brand-level parameters as well. These exam-
ples suggest why model parameters may vary, even if the structure of the model can
remain as is. All observable market changes that can be related to model character-
istics give reason for the model builder to respecify the structure and/or reestimate
the parameters. For example, if a firm has traditionally sold its products through
independent distributors but has designed a wholly-owned distribu_tion network, then
the change in the structure of the selling process will require that a new model be
created.
8. Of course, there remains the problem of coordinating the various subproblems in order to make sure that
subunits do not pursue objectives that are in conflict with overall company objectives. Internal or transfer pricing
can be successfully applied to overcome such potential conflicts. See, for example, Baumel and Fabian (1964)
and Hess ( 1968).
108 CHAPTER 7
Knowledge of the marketplace, and the changes that require a new model struc-
ture, is the primary determinant of adaptation. A secondary determinant is the dif-
ference between actual and predicted values (see Section 3.2 and the illustration in
Figure 3.1 ). The greater this difference, to the extent that it cannot be attributed to sta-
tistical uncertainty, the more reason there is to adapt. Thus, the greater the prediction
error the greater the need to respecify the model. Is there a missing variable? Should
the functional form be modified? Do consumers respond differently?
It should be clear that both the use of logical arguments ("the model needs to
change because the market environment is different") and a careful analysis of pre-
diction errors are critical for continued model use. Alternatively, one can update
parameter estimates routinely, by reestimation each time new data become available.
Routine reestimation is advisable if there are, for example, gradual changes in the
consumer population. It is conceivable that by tracking how the parameter estimates
change over time, short-term projections can be made of how the parameters will
change in the future. Little ( 197Sb, p. 662) stresses the importance of adaptive control
by continually measuring programs and monitoring systems to detect change. Formal
systems for (adaptive) control of advertising spending have been proposed by Little
(1966) and Horsky and Simon (1983).
An important determinant of the likelihood of model acceptance is that it is easily
adaptable. It is always easy to replace previous parameter estimates. Adaptation to
structural changes tends to be more difficult. A facilitating factor is the modularity of
the model. For example, a marketing mix model may consist of a set of submodels,
such as a price-, an advertising-, and a promotion module. The modular structure
will prove particularly useful when a change does not involve all of the modules. An
excellent example of modular model building is BRANDAID, a marketing mix model
developed by Little (1975a, 1975b).
Suppose we take a pragmatic view of the meaning of the label "good model", namely,
a model is good if it works well. For example, if a model has been developed to
forecast sales, and if the model's predictions stay within, say, one half percent of
actual sales, management might decide that the model is excellent. With this level of
forecast accuracy, management may not care much about how the model works, or
whether or not its structure is robust. 13
It turns out that there are many examples of models that are non-robust but never-
theless, within reasonable limits, work well.
We begin with a simple example. Consider the following multiplicative sales
response function: l4
ap{Jp afJa
qt = t t (7.3)
where
qt unit sales in period t,
Pt = price in period t,
at advertising expenditures in period t, and
a, {Jp, fJa = model parameters.
The parameters {Jp and fJa are respectively the price and advertising elasticities. Nor-
mally {Jp will be negative and larger than one in absolute value (see Tellis, 1988b), and
fJa will be between zero and one (see Assmus, Farley, Lehmann, 1984). The question
now is: what happens to sales as price and advertising vary? For at = ao (ao > 0),
and with price becoming very high, sales tend to zero as expected. On the other hand,
as price goes to zero, unit sales go to infinity. However, since sales cannot exceed
the total market potential, a finite quantity, this implication is not realistic. Therefore,
following the definition given in Section 7.2.5, the model is not robust. And what
about advertising? Given Pt = po (po > 0), and letting advertising go to infinity, we
observe from equation (7.3) that sales also go to infinity. In addition, with advertising
equal to zero, sales are also zero. One might argue that it is realistic to expect zero
unit sales in the absence of advertising. For a new brand about which consumers have
no awareness nor knowledge, this implication may be realistic: brand awareness is
often a necessary condition prior to purchase. However, for established brands, zero
advertising will not result in zero unit sales, at least not immediately. Thus, we need to
know more about the type of brand and the characteristics of the marketplace before
we can make a proper judgment about the extent to which equation (7.3) is or is not
robust.
Some additional problems arise with market share functions. Suppose we use:
mjr = a j [ p jt
Pet
rpj [a r·j
jt
acr
(7.4)
where
mit = market share of brand j in period t,
PJr.ajr price, and advertising, for brand j in period t,
Pet. act = a competitive price index, and a competitive
advertising index, and
IXj, /3pj. f3a) = parameters for brand j.
For realistic values of variables and parameters, market share in equation (7.4) will be
non-negative. But, in addition to having the same non-robust characteristics as (7.3),
equation (7.4) would predict an infinite market share if company j has a positive
advertising budget, if competitors do not advertise. 15 Yet, by definition, market share
should lie between zero and one.
Market share functions should ideally also satisfy the constraint that predicted
market shares across brands sum to one. This may seem trivial, yet it is not. To show
why, first consider a linear model with two finns j and c:
The observed market share values will, by definition, sum to one. As suggested
above, it is desirable (and it is a necessary condition for robustness) to have esti-
mated values satisfy the same condition. Naert and Bultez (1973) have shown that for
estimated market shares to sum to one, the following constraints should be imposed
15. This problem can be avoided by placing p jt+ Pet and a jr+act in the denominators. However, the drawback
then is that market share does not depend on own-brand advertising, if competitors do not advertise.
16. aj1 = ajr/(aj 1 + actl. a;,= acr/(aj1 + acr).
112 CHAPTER 7
A class of robust market share models are the so-called attraction models. An example
of an attraction model is (for j = 1, .. , 4 ):
*fJij /32
Ctj · ajt m j,t-!
(7.9)
where
I 7. Constraints for the general linear model (with constant term) were derived by Schmalensee (1972) and
Naert and Bultez (1973). See also McGuire and Weiss (1976) and Weverbergh (1976). For a further discussion,
see Chapter 9.
18. Except for such trivial and meaningless constraints as ).. j = I.e = 0, Yj = Yc = I, or Yj = Yc = 0, ).. j =
I.e= I.
19. The example is taken from Leeflang, Reuyl (1984a).
IMPLEMENTATION CRITERIA 113
Linear additive (7.7) 0.04b -0.01 0.04b -0.02 o.soa 0.81° 0.67° 0.700
Multiplicative (7.8) 0.13b 0.01 0.07° -0.02 0.46b 0.84° 0.81° 0.700
Attraction (7.9) 0.600 -0.02 o.o6b -0.34 +-0.86° ~
20. For completeness, we provide a brief explanation of statistical significance for parameter estimates. The
null hypothesis Ho is that the parameters fJJj and fJ2j· j =I, .. , 4, are equal to zero. A statistical test (t-test) is
applied to test each of these hypotheses. For all three models the null hypothesis is not rejected for fJ12 and Pt4·
The null hypothesis is rejected for all other parameters of (7 .7). However, there is a one percent (a), respectively
five percent (b), chance that Ho is erroneously rejected. The tests are two-tailed which means that expectations
about signs of coefficients are not taken into account. Two-tailed tests are more conservative than one-tailed
tests if the chance of incorrectly rejecting Ho is held constant.
114 CHAPTER 7
In a number of well-known and commercially successful models such as the SCAN* PRO
model (Wittink, Addona, Hawkes, Porter, 1988), PROMOTIONSCAN (Abraham,
Lodish, 1992), DEALMAKER (McCann, Gallagher, 1990), "brand sales" is the cri-
terion variable. Although the empirical results tend to be plausible, one can provide
the following criticism of non-robustness. For example, the criterion variables in the
SCAN*PRO model are brand unit sales (j) at the store level (k) in week t. The model
estimates the effect of promotional variables implemented by retailers. Now, brand
sales Qkjt can be aggregated over stores (q.j 1 ) over brands (qk. 1 ), over brands and
stores (q .. 1 ), over time, etc. However, there is no guarantee that:
n
L likrt = Qk·t or (7.11)
r=!
K
Llikjt Q·jt
k=!
where
n = the number of brands and K the number of stores.
L
7rj = aoj +L fX£jX£j (7.12)
£=1
where
nj the probability that a consumer purchases brand j,
aoj = an intercept term,
aej parameters capturing the effects of marketing variables,
Xfj marketing variables.
IMPLEMENTATION CRITERIA 115
The linear probability model may yield predictions of the purchase probability outside
the (0, 1) interval, dependent on the values of marketing variables and parameters,
and is thus logically inconsistent. A so-called (multinomial) logit model describing
purchase probability uses a formulation that is similar to that of an attraction model
for market shares and resolves the problem of logical inconsistency:
In this section we examined some non-robust marketing models. The market share
and choice models give bad answers for extreme values of the predictor variables, and
over the whole range of possible values some of the models do not have the property
that the predicted market shares sum to one across brands. Nevertheless, these same
models have been used in empirical studies, and they often fit the data well and seem
to have descriptive value. This leads us to believe that the requirement of robustness
should be linked to the use one intends to make of the model. We explore this further
in the next section.
In this section we show that data generated from one model can sometimes be fitted
quite well with other models. The primary reason is that predictor variables often do
not show much variation in the sample. We use an example to show that robustness is
not sufficient for model quality.
Suppose the true relation between unit sales and advertising of brand j is:
il
ait
qjt =a+ (f.!- a ) - -0-. (7.14)
Y +ait
This equation has a robust functional form, the properties of which we examined
in Section 5.3.3, which follows a S-curve. 21 Assume that the parameters are a =
100, 000, f.!= 400, 000, y = 200, o = 0.5. We show the corresponding function in
Figure 7.1, which we refer to as "real S" (the trueS-curve):
a 0.5
jt
qjt = 100, 000 + 300, 000 O5 + Bt (7.15)
200+ aii
21. In Section 7.5 we show that (7.14) is in fact robust only in"a limited sense.
116 CHAPTER 7
qjt
Thousands
of
units
300
200 Real S
Linear
Multiplicative
Semi log
Estimated S
100
' Range of '
; observations :
' (RA) '
where s1 is a normally distributed error term with mean zero and a standard deviation
of 1,000. To simulate unit sales we took as values for aj 1 : $100,000, $105,000, ... ,
$200,000. Thus in the simulated period of observation, advertising varies between
$100,000 and $200.000. In Figure 7.1 this is the interval RA. We assume that the
company has never operated outside that range.
Suppose now that we postulate a linear relation between qj 1 and a jr:
qjt = c/ + f3'ajr + s;. (7.16)
Estimating a' and f3' from the simulated data gives:
i]jr = 263, 000 + 0.23 a jt, R 2 = 0.986 (7.17)
(237.0) (35. 7)
Inq}; = 11.26+0.lllnaJ1 ,
(92.0) (11.0)
R 2 =0.870. (7.19)
The true model (7.14) is nonlinear in the parameters, but can be linearized as fol-
lows:23
(7.22)
{3- qjt y
Taking logarithms we obtain:
q·,- a!
ln - 1- - = y* + 8 lna1 1 (7.23)
{3- qjt
[ -] qjt-
In-A--
a!
{3 - qjt
= -2.94+ 0.28lnaj 1 ,R =0.989
( -35.9) (40.9)
2
(7.24)
Next we evaluate the predictive abilities of the various models. Figure 7.1 shows
the real S-curve as well as the four estimated curves. From this figure, it is clear
that the four curves do just about equally well as long as the advertising budget falls
within the interval RA. However, outside this interval, we know that (7.15), (7.17),
and (7 .21) do not approach a finite limit, but predicted sales continues to increase
with advertising. In that sense (7.24) would seem to be more appropriate. When
advertising goes to zero, we know that (7.21) (the semi-logarithmic function) breaks
down, since it predicts sales of minus infinity when there is no advertising spending.
For the S-curve, the linear function and the multiplicative function, the predictions
for low advertising spending are robust and we cannot reject any of these (although
we may object to positive sales for zero advertising in the linear model).
We note that for all of the above models, the forecast accuracy decreases as we
move further away from the interval RA. That is, the standard error of the forecast
is at a minimum at the mean value of the predictor variables in the sample data. 25
Also, outside RA we can only guess how unit sales responds to advertising since we
have no data to support the argument. However, we expect that outside RAthe linear
model will not perform well, because generally advertising shows decreasing returns
to scale beyond a certain level of expenditures.
Even if the model specification is considered theoretically desirable, as we may
conclude about the equation that produces the S-curve, the empirical results may
show limitations. For example, a comparison of the true parameter values in (7.15)
with the estimated parameter values in (7.24) shows remarkably large differences.
When advertising is zero, the true amount of unit sales is 100, 000 whereas the
predicted amount is only 5, 000. Thus, an apparently attractive model specification
which is known to be correct in a simulation study can still produce predicted values
with a high amount of error. The very limited range of variation in advertising that
occurs in the simulation makes it difficult to obtain accurate predictions for extreme
conditions such as zero advertising.
If (p- c)/3 > 1, profit increases with advertising, and the model would suggest that
management spend as much on advertising as possible. If (p- c)/3::::; 1, profit would
be maximized at zero advertising. Managers will rej.ect such implications as being
nonsensical. Yet the linear model is robust for predictions within the range RA.
The other three models give meaningful normative implications for the adverti-
sing budget, although the semi-logarithmic model is not robust for small levels of
advertising spending, and neither the multiplicative nor the semi-logarithmic model
is robust for very high levels of advertising spending.
Table 7.3 shows the optimal advertising budgets for each of the models (excluding
the linear one), the corresponding optimal predicted profit amounts and the real profit
amounts, by assuming a contribution margin of $5. We see that the spending levels
are quite similar across the models, and that the corresponding real profit levels differ
even less. The latter is due to the insensitivity of profit to changes in advertising
spending in a wide range around the optimum. 26
The predictive power of (non-robust) linear and multiplicative market share mod-
els and (robust) attraction models has been compared in a number of studies. 27 The
results show that the attraction models do not always have significantly greater predic-
tive power than their non-robust counterparts. Thus, even though the attraction models
have advantages on theoretical grounds this does not imply that better predictions are
obtained. One problem with these empirical comparisons is that insufficient attention
is paid to the characteristics of the data used for predictions. Robust models will
outperform other models only if the data allow the advantage of robustness to exhibit
itself.
It should be clear from this discussion that the desired robustness for a model
depends on model use. Given their theoretical advantages and that, nowadays, these
models are not difficult to estimate, robust specification should generally be preferred.
26. For evidence that low sensitivity is also frequently observed in the real world, see Naert ( 1973), and Bultez
and Naert (1979).
27. See Naert, Weverbergh (198la, 1985), Brodie, de Kluyver (1984), Ghosh, Nes1in, and Shoemaker (1984),
Leeflang, Reuyl ( l984a).
120 CHAPTER 7
In his illustration of a decision calculus, Little (1970) uses a functional relation such
as (7 .14), but with sales replaced by market share: 28
(7.27)
Assuming 0 .:::: a j .:::: f3 j .:::: 1, it turns out that this model constrains predictions to a
meaningful range of values. Thus this model seems to satisfY the criterion of robust-
ness. Yet, in a strict sense, this market share function also proves to be problematic.
For example, suppose we consider how j's market share changes if competitors
double their advertising, while j 's advertising remains unchanged. From equation
(7.27) we see that market share remains the same. This suggests that the model is not
robust. However, our conclusion may be different if we take the problem situation
being modeled into account.
Suppose that the market consists of ten firms, with company j having about 50
percent market share, and the second largest firm about 10 percent. For such a market,
equation (7.27) is much more acceptable. Company j is clearly the leading firm and it
is unlikely that a leading firm will keep its advertising constant if competitors double
theirs. More likely is a scenario in which competitors react to changes in the leading
firm j 's advertising spending. All these reactions will be reflected, in an average
way, in the coefficients. Although the model does not explicitly consider the effect of
competitors' advertising on brand j's sales nor the competitive reactions, it does so
implicitly through the coefficients. Thus the decision maker may still be able to use
the model to evaluate the effect of changes in the advertising budget for brand j on
market share and profit, especially if brand j's advertising share is much larger than
50 percent. However, we expect that marketing actions are improved if those other
aspects are explicitly modeled.
By adding other features to the model in (7.27), such as replacing aj 1 by ajr/acr
with ac1 equal to competitive advertising, the predictive ability of the model may also
improve. But such additions need to be justified in a cost-benefit sense. The question
is whether the incremental gain to management from increased accuracy in predic-
tions exceeds the incremental cost of collecting data on the advertising spending of
competitors.
Finally, consider a different problem situation. Suppose that the market consists of
two firms, each of which has approximately 50 percent market share. It is obvious that
competitive activity should now be considered explicitly in the model, in particular if
an increase in advertising by one firm is followed by a similar increase by the other
firm, and industry sales tend to remain constant. In this problem situation, equation
(7 .27) cannot be a robust representation of reality.
28. Index j has been added for clarity (relative to equation (7.28) below).
IMPLEMENTATION CRITERJA 121
We note that if the same type of market share function is defined for a competing
firm:
(7.28)
no meaningful restrictions can be imposed on the parameters to satisfy range and sum
constraints on market shares. This aspect of non-robustness remains even when a jt is
replaced by Gjt/Gct and Get by act/Gjt·
We conclude that a model should not produce absurd predictions when predictor
variables take extreme values. The importance of robustness in this sense depends,
however, on the problem situation being modeled, and on the intended model use.
For example, if extreme values never occur in practice, model behavior at such values
is not critical. However, if the predictions are absurd at extreme values, model impli-
cations may not be very good at less extreme values. In addition, the model builder
may not know the intention of the model user.
Everything else being equal, robustness is a desirable criterion. But robustness
perse is not a panacea. The example in Section 7.4 shows that a model with desirable
structure can have unreliable parameter estimates. 29 Thus, multiple, and possible
conflicting, aspects need to be considered jointly.
In this chapter, models are classified according to the primary purpose or intended
use. We distinguish:
We note that a given model can be intended to be descriptive, predictive and nor-
mative. Indeed, one can argue that for a model to have valid normative implications,
it must have predictive value and at least some descriptive power. However, a de-
scriptive model does not necessarily have normative implications. And a predictive
model may also not be useful for normative considerations. Therefore, we distinguish
models as to their primary purpose.
We present two examples of a descriptive model in Section 8.1. The first shows a
representation or description of important elements of a pricing decision by means of
a logical flow model. The other example shows a description of a complex organiza-
tional buying process.
We discuss a predictive model in Section 8.2 that shows how the effects of alter-
native marketing programs on performance measures of a firm can be predicted.
Finally, in Section 8.3, normative models are developed. An example illustrates
the sequence from descriptive to predictive to normative models. The chapter con-
cludes with a small survey of applications of allocation models.
There are multiple reasons for the development of models of decision processes.
In Section 4.1, we argued that such models may result from a desire to make ex-
isting decision procedures explicit, or to examine whether those processes allow
for automation or improvement. A descriptive model may also be a first step in a
123
124 CHAPTERS
larger modeling effort with prediction or even optimization as the ultimate aim. The
distinction between these three model types is important since model requirements
when the purpose is only descriptive are generally less exacting than the requirements
for models predicting the impacts of decision- and environmental variables. This is
demonstrated by the simulated example in Section 7.4.
In this section, we first discuss the study of Howard and Morgenroth (1968),
referred to in Section 2.2. The objective of their study was to describe the pricing
procedure by a large company operating in an oligopolistic market. In this company
executives had great difficulty verbalizing the process through which price decisions
were made. Indeed the procedure was felt to be so complex that describing it was
considered virtually impossible. Yet, the flow diagram resulting from lengthy discus-
sions between executives of the company and the authors is fairly simple, as can be
seen in Figure 8.1.
The descriptive model in Figure 8.1 is of the logical flow type. 1 The figure shows
that the procedure begins by watching Pwilt, the wholesale (w) price of the initiator
(i) in a local market (l) at timet (see box 1). Three alternative outcomes are possible:
l. If Pwilt does not change and differs from Pwxlr. where Pwxlt is the wholesale (w)
price of our company (x) in l at t, no action is taken (box 2).
2. If Pwilt increases (box 3), there are various possible reaction patterns depending
on the attitude of the district sales office (DSO):
• if the DSO agrees, the price increase is followed (boxes 4 and 5);
• if the DSO does not agree to a price increase, it can be overruled by the
decision maker (DM), if she thinks other competitors will also increase their
price (Pwo) (box 6);
• if the DSO does not agree to a price increase, and the DM feels that other
competitors will not increase their price, a holding period is enforced (box 7),
and price is increased only if other competitors mise their price. If Pwo does
not go up, no action is taken.
3. If Pwilt decreases, and differs from Pwxtr. DSO is again contacted, competitors'
sales volumes in the local market are taken into account, waiting periods are
observed, and steps are taken to prevent the price cut from spreading to adjacent
market areas. 2
From Figure 8.1 and its description we conclude that company x does not initiate
price changes. The company is a "follower", the initiating company the price "leader".
Another point is that the reaction to an initiator's price increase differs substantially
from that of a price decrease.
The procedure seems simple enough. But does this flow diagram really model the
decision process? To answer the question, we have to put the model to a test, which re-
quires advancing one step, from the descriptive phase to the predictive phase. Howard
I. Some symbols have been adapted to be consistent with our notation.
2. The complete procedure in case of a price decrease can be deduced from Figure 8.1, similar to the description
given for a price increase.
INTENDED USE 125
No
Is qu > qxl
No
No
IS
Yes
No Yes
Is qxl > qxn?
and Morgenroth tested their model on 31 actual decisions. Managers made decisions
without the model, using existing company procedures. The model was applied in-
dependently to the same 31 situations. The model predictions were then compared to
the actual decisions, and the degree of agreement was found to beacceptable. 3
3. This is a test of the accuracy of the model's output. Howard and Morgenroth also present a test of the process,
which asks whether the model describes the decision-making process used by the manager.
126 CHAPTERS
We distinguish three methods, each being suitable for the development of small sam-
ple descriptive models: protocol analysis, script analysis and processs simulation. 6
Protocol analysis is based on the information-processing theory of Newell and
Simon (1972), which states that individuals go through complex cognitive processes
in distinct stages. Crow, Olshavsky, and Summers ( 1980) applied this theory in a study
in which they asked fourteen purchasers to phrase their thoughts during hypotheti-
cal, modified repurchase decisions. The subjects were all members of medium-sized
or large organizations. Vyas and Woodside (1984) used protocol analysis of real
purchase situations. They gathered additional information by observing meetings
between buyers and sellers and studying documents, such as records and quotations.
This combination of data gathering is based on decision systems analysis (Capon and
Hulbert, 1975). Vyas and Woodside studied 62 individuals who constituted 18 buy-
ing centers. The buying decisions concerned long-term purchasing agreements. The
studies show what specific decision criteria and decision rules are used in different
4. Other examples can be found, for example, in Cyert and March (1963).
5. See also Lilien, Kotler, Moorthy (1992, Chapter 3) and Brand, Leeflang (1994). The discussion about
industrial markets is based on Brand, Leeflang.
6. The following text is based on Brand (1993).
INTENDED USE 127
Script analysis is based upon the concept of"cognitive scripts" derived from cognitive
psychology (Abelson, 1976). A cognitive script is a prescribed range of behavioral
actions that are stored in the brain of a human being who repeatedly experiences
the same situation. The script influences an individual's expectations about others
and the individual's interpretation of received information. Leigh and Rethans (1984)
collected the cognitive scripts of 36 purchasers for four different situations. They
found repurchases and new tasks differ greatly with respect to the information search
process. For repurchases, the present supplier is usually the sole informant on possible
solutions to a problem, whereas for new tasks the buyer engages in a more elaborate
information search. The usefulness of the script analysis is high at the individual
level. A study of scripts across respondents may not show a consistent description of
the process.
Replacement
Additional
Stages I and 2
Price
Stages 3 and 4 Technology
Dependaoility
Service
Relationship
Price
Stage 5 Technology
Dept;ndability
Servtce
Relationship
Stage 6
Stage 7
Stage 8
Satisfactory
Non satisfactory
these decision makers were asked to identify indicators for each of these five dimen-
sions. In this manner Brand defined 51 indicators. The weights for these indicators
depend on the step in the buying process. For example, image and reputation play
an important role during the early stages of the buying process, whereas specific
elements of the suppliers' communication mix and the quotation are more important
in later steps. The decision makers involved in the relevant steps of the buying process
also evaluated each of the suppliers on scales for the indicators, ranging from I (low)
to 5 (high). With the scores on all relevant indicators for each supplier and buyer-
specific weights for the indicators relevant to a stage in the decision-making process,
the attractiveness of each supplier can be determined in each stage for a given buyer.
In case of multiple decision makers, weighted averages scores (and indicator weights)
were defined. The description of the buying process (1), the weights (2), and scores
(3) can be used to develop models that describe decision rules. In the research by
Brand, these rules turned out to be largely determined by two factors, viz.:
• the presence of an approved vendor list (AVL) in the buying organization (stages
4 and 5),
• the time pressure of the purchase (stages 5 and 6).
If the buying organization uses an approved vendor list, stage 4 of the buying process
is redundant. In stage 5 RFQ's are then sent to a limited number of suppliers on the
AVL. A current supplier of heat-exchangers has an additional advantage.
If the buyer faces time pressure, stage 4 proceeds at a faster rate. In stages 5 and 6
delivery time tends to have much weight, while price is accorded less weight than in
earlier stages. The search for a supplier is also more concentrated; only two or three
suppliers are asked to prepare a quotation.
Combining these two factors and three stages, different decisions rules can be
formulated. To illustrate, we show in Figure 8.3 the decision rule in stage 4 for a
buying organization that uses an AVL. For this buyer AVL-suppliers are automatically
admitted to stage 5. Non-AVL suppliers need to obtain a maximum score on either
price (P = 5) or technology (T = 5), and at least average on three other dimensions
(excluding R=relationship) or have both P and T equal to 4 (and at least average
on two other dimensions excluding R), in order to be admitted to stage 5. From
Figure 8.3 it is clear that this buyer cares primarily about price and technology, and
secondarily about dependability and service, in stage 4.
Brand used descriptive models to determine the effectiveness of a supplier's in-
strument. Overall, technical quality (T) and the buyer/seller relationship (R) ap-
peared to be dominant criteria. Price and other variables are especially important in
step 6.
Major advantages of process simulation, of protocol- and script analyses are that
the model can include more complexities, such as a multi-person buying center, and
that a range of "what if' questions can be considered.
Although the primary objective of descriptive models is to describe existing de-
cision processes, it is often useful to evaluate the validity of these models by com-
130 CHAPTERS
Supplier on AVL
!y~ No
Go to stage 5
P=Sand
D, T,S ?:.3
or § ~ Supplier is out
Go to stage 5
paring predictions with actual decisions. In the next section we describe models
developed specifically for predictive purposes.
A special case of this naive model sets &oj = 0 and &!j = 1. In that case the predic-
tion for this period is the actual outcome in the previous period.
The purpose of conditional forecasting models is to predict the outcome of a
proposed marketing program or to predict the consequences on own-brand sales or
market share of competitors' possible programs. Such "if-then" types offorecasts can
only be derived from causal models. The intended model use and the availability of
(causal) data determine whether unconditional or conditional forecasting methods are
used. Thus, if no "causal data" on marketing instruments are available one can only
generate unconditional forecasts, unless subjective data are generated (see Section
16.9).
We illustrate conditional forecasting with a simplified model developed to exam-
ine the likely impact of changes in marketing decision variables on performance
measures. 12 The model is based on a numerically specified market share model that
has been calibrated on data of the detergent market in the Netherlands. For additional
discussion and critical evaluation of this model, see Section 15.4. 13
The estimated equation is:
In this example t is a bi-monthly period. Given the mathematical form of (8.2) and the
discussion on model robustness in the previous chapter, it is clear that (8.2) has only
limited usefulness as a predictive model. Because of this we could warn model users
not to go outside the range of values for the decision variables present in the historical
data. Thus we could take precautions by explicitly adjoining range constraints to the
model. For example:
Q1 = 8, 500. (8.3)
From the definitions of mit and Q1 it follows that the predicted value of brand sales,
qj1 , equals:
(8.4)
or
(8.5)
14. It should be clear, however, that to fully exploit the example, one should be able to assess the impact on
product class sales of price and advertising of all of the brands on Q 1. This, of course, requires the use of the
product class equation, and not one expected value: see equation (15 .2) for a discussion.
INTENDED USE 133
where, pit is retail price, and rm jr retailer's margin, which for this product amounts
to 30 percent of the retail price:
aj,t-1 Pit
= 0.65 [ (0.5957 Pjt- 1)0.85 ( 6.9 + 7.9--
Ar-1
-7.2-
Pr
(8.12)
7r'j,
Figure 8.4 Profit after tax (n'j,) as afonction ofmarket coverage (djr ).
Tables 8.1-8.3 shows a return to the original values to accommodate the lagged ad-
vertising effect. These examples suggest the benefits of "what if'' types of predictive
models. The models provide the marketing manager with opportunities to explore
alternative marketing programs based on an estimated market share model. Such
explorations may suggest actions to avoid as well as desirable activities to initiate.
Importantly, these actions can be considered in a simulation which avoids a lot of
potentially undesirable experimentation in the market place.
The model structure and the predicted effects of alternate marketing programs
shown in Tables 8.1-8.8 in fact demonstrate that the advertising budget that maxi-
mizes profit is zero. Thus this model is "nonsensical" in light of the discussion in
Section 7 .4. The value of market coverage that maximizes profit can be derived from
Figure 8.4. The profit sensitivity to market coverage seems realistic with a maximum
close to 90 percent coverage.
We note that the foregoing sensitivity calculations assume the lack of uncertainty
in predicted values. There are two considerations relevant to this. One is that it is
possible to construct confidence intervals around the predicted values so as to quan-
tify the statistical uncertainty. The other is that optimal decisions may incorporate
asymmetries in uncertainties or risk.
Experience in model building has resulted in some guidelines to improve the pre-
dictive accuracy of causal models, relative to naive models. In general the predictive
accuracy of causal models is improved if,
INTENDED USE 135
We discuss other issues regarding predictive accuracy and predictive validity in Chap-
ters 14 and 18 respectively.
17. See, for example, Wittink (1987), Chen, Kanetkar, Weiss (1994), Foekens, Leeflang, Wittink (1994),
Christen, Gupta, Porter, Staelin, Wittink ( 1997).
18. Wittink (1987), Kumar (1994).
19. Danaher, Brodie ( 1992).
20. Foekens, Leeflang, Wittink, (1994).
136 CHAPTERS
INPUT VARIABLES
Price brand j 3.40
Average price 2.86
Advertising brand j 207
Total advertising 4,000
Market coverage 0.85
t +I
INPUT VARIABLES
Price brand j 3.40 3.40
Average price 2.86 2.86
Advertising brand j 311 207
Total advertising 4,104 4,000
Market coverage 0.85 0.85
t +I t+2
INPUT VARIABLES
Price brand j 3.40 3.40 3.40
Average price 2.86 2.86 2.86
Advertising brand j 311 207 207
Total advertising 4,104 6,000 4,000
Market coverage 0.85 0.85 0.85
INPUT VARIABLES
Price brand j 3.23
Average price 2.84
Advertising brand j 207
Total advertising 4,000
Market coverage 0.85
t +1
INPUT VARIABLES
Price brand j 3.23 3.23
Average price 2.84 2.72
Advertising brand j 207 207
Total advertising 4,000 4,000
Market coverage 0.85 0.85
INPUT VARIABLES
Price brand j 3.40
Average price 2.86
Advertising brand j 207
Total advertising 4,000
Market coverage 0.95
INPUT VARIABLES
Price brand j 3.40
Average price 2.86
Advertising brand j 207
Total advertising 4,000
Market coverage 0.90
INPUT VARIABLES
Price brand j 3.40
Average price 2.86
Advertising brand j 207
Total advertising 4,000
Market coverage 0.75
The third class of models consists of normative or prescriptive models. Their purpose
is to determine a recommended course of action that should improve performance. In
other words, one wants to determine which decision is best for an objective such
as profit maximization. In Section 8.3.1, we discuss a marketing mix problem to
illustrate a normative model for profit maximization.
Many of the normative models in marketing are allocation models, in which a
certain amount of a particular quantity (money, time, space) is available, and the
objective is to allocate it between alternative uses in an optimal way. We enumerate
some existing allocation models in Section 8.3.2.
effect was statistically insignificant. This is one of the dilemmas often faced in applied
econometrics. We know that price affects demand in general. However, if price (or any
other variable for that matter) is relatively stable over the period of observation, then it
cannot have much explanatory power in the sample. If price is excluded, its effect will
be taken up by the constant term, the implication being that, on the basis of historical
data, nothing can be said about the impact of price on sales. Thus, if the objective is to
determine the optimal marketing mix, the information contained in the historical data
would be insufficient. Other means, such as experimentation or subjective judgment
may provide useful insights. The lack of sufficient price variation is of no conse-
quence if the model is to be used for a determination of optimal advertising spending,
under the assumption that the actual level of advertising does not depend on price.
The output is a recommended budget. In this sense, the model is normative. However,
from a practical point of view, the word "normative" is perhaps too strong. What one
is really interested in is to determine whether current advertising expenditures are too
high, too low, or about right. The model can produce a specific figure but it is useful
to take it as a guideline rather than as something absolute. We provide the following
reasons why we might not use the word "normative" in an absolute sense:
1. the demand equation is estimated, which implies that there is uncertainty about
the true values of the response coefficients; 22
2. advertising is only one instrument in the firm's marketing mix;
3. the firm faces multiple objectives, while most models assume the existence of a
single objective such as profit maximization; 23
4. the effectiveness of advertising depends on the quality of the copy, the selection
of media, and so on. The regression coefficient gives at best an idea of an average
effect. 24
We now derive an optimal advertising budget for the profit-maximizing firm whose
demand function is given by (8.13). We first show how to optimize advertising ex-
penditures when lagged effects of advertising are not taken into account. After that
an optimal solution is derived which assumes that the lagged sales variable represents
advertising dynamics.
22. Given footnote 21 related to the asswned lag structure, there will not only be substantial variance in the
estimated parameters, but the estimates themselves could be biased, i.e. their expected values might differ from
the true parameter values.
23. We note that other objectives, such as short-term sales maximization and growth maximization can be
consistent with an overall objective of long-term profit maximization.
24. Given a specific campaign, with a specific copy and media plan, one could make an appropriate adjustment
in the advertising parameter. The idea of combining databased parameterization with subjective parameterization
was first suggested by Lambin (1972b). We elaborate on this in Chapter 16.
146 CHAPTERS
disposable income. Substituting the values of these variables, equation (8.13) reduces
to:25
(8.14)
The objective of the firm is to maximize profit per 1,000 potential consumers, n 1 :
(8.15)
where c1 is unit variable cost. In the remainder of this discussion, we assume that c1
is constant, i.e. c1 = c for all t. In other words, variable cost (c) and marginal cost
(MC) are equal. Fixed costs were not given in Lambin ( 1969) but can be ignored
since these do not affect the optimal level of advertising spending. Given that price is
predetermined, we let p1 equal p, so that (8.15) reduces to:
frt = (p- c)qr- ar. (8.16)
To maximize profit, the following relationship has to be satisfied: 26
aq
aa
= [ aq
alog,oa
J[0.4343]
a
. (8.21)
It follows from (8.21) that the optimal advertising spending level a* should satisfY:
a =
(aq ;a log 10 a)(0.4343) .
*
(8.22)
aqjaa
From (8.13) we know that aq ;a log 10 a = I, 777, and from (8.20) aqjaa = 0.303.
Thus a* equals:
Compared to actual expenditures of3,440 BF, it appears that the finn is overspending
on (short-term) advertising. It is, however, instructive to determine how sensitive
profit is to changes in advertising spending. We first examine the profit when the
finn continues the current advertising spending. Profit is predicted to be:
fc = (p - MC)q - a = 3.3q - a.
With a = 3,440 BF, we found ij = q = 4,060, so that current profit is predicted to be:
fc = 3.3 x 4, 060- 3, 440 = 9, 958 BF.
We see that, if only short-term effects are considered, actual advertising expenditures
exceed the optimal level by about 35 percent. However, current profit is only about
1.5 percent below its maximum value. This suggests that profit is quite insensitive to
changes in advertising expenditures.
Since 0 ::= ). < 1, we also have 0 ::= 1 : i < 1, and (8.23) reduces to:
q(p- c)
:n:(LT) = l _ ).j(l + i) -a. (8.24)
If the firm takes a long-term view, the optimal advertising budget is:
aLT = (1, 777)(0.4343)/0.161 = 4, 801 BF.
Corresponding expected sales are:
qLT = -2,213 + 1, 777log4, 801 = 4, 329
and expected long-term profit is:
If advertising expenditures remain at the current level, expected long-term profit is:
fr
LT
= (4 , 060 )(3 .30)- 3 440 = 21 811 BF.
0.5306 ' '
Thus, by taking into account the lagged effects of advertising, we find that actual
spending is an estimated 28 percent below the optimal amount. Importantly, with
positive lagged effects, the optimal advertising expenditure increases relative to the
case when we ignore these effects. As before, if we increase the advertising budget to
the optimal level, the expected profit increases by only 1.43% percent.
We note that sensitivity analyses frequently demonstrate that large percentual
changes in advertising expenditures result in only small percentual changes in profits
over a wide range of expenditure levels. This phenomenon is known as the flat max-
imum principle (Hanssens, Parsons, Schultz, 1990, p. 26). In this respect we refer
to Chintagunta ( 1993b) and to the discussion whether lag structure matters in the
optimization of advertising expenditures in Section 6.1. We note that profit is more
sensitive to departures of price from its optimal level than it is to departures from
optimal advertising expenditures.
We have to be careful in the application of optimization rules. In the discussion
above we implicitly assumed that there are no profitable alternatives to advertising
spending. Indeed, the condition f.-L = 1/w implies that at the optimum:
aq
(p -MC) aa = l.
Stated in words, this means that the last dollar invested just pays for itself but nothing
more. In fact the firm can do better, by investing that last dollar in some other venture
where it can earn a return of re percent. The return on the best possible alternative
investment should be considered as an opportunity cost. With an opportunity cost of
re percent, the optimality condition becomes:
1 +re
t-L=--.
w
= = =
With re 0.20, and f.-L (1 + 0.20)/0.55 2.18, at optimality aqjaa should equal
f.-L/ p = 2.18/6 = 0.36364. Applying (8.22) for example, we find an optimal short-
term advertising budget of 2,122 BF, instead of 2,545 BF which we obtained when
the opportunity cost was (implicitly) assumed to zero.
One final note about the specific normative model discussed in this section: it con-
cerned a monopolist. 32 Since monopolies are the exception in the marketplace, rather
than the rule, it is important to study marketing mix decisions and the profitability of
advertising in, say, oligopolies. Some studies applicable to competitive environments
are reviewed in Bultez (1975), Lambin (1976), Hanssens et al. (1990, Chapter 8), and
Erickson (1991). This topic receives much attention in the more recently developed
32. That is the demand equation did not take into account any element of competitive rivalry, i.e. a monopoly
was assumed. Also the Dorfinan-Steiner theorem was derived for a monopoly. An extension to oligopolistic
markets was provided by Lambin, Naert and Bultez (1975), and will be touched upon in Chapter 9.
150 CHAPTERS
Several normative models have been successfully developed to determine the optimal
sales force. Examples are Lodish's CALLPLAN (1971), Lodish, Curtiss, Ness and
Simpson (1988), Rangaswamy, Sinha and Zoltners (1990) and Gopalakrishna and
Chatterjee ( 1992). 33 CALL PLAN is an interactive system that supports the sales force
for the planning of visits. It requires managerial judgments about response functions
and information about traveling expenses and available time. The built-in search
procedure identifies the sales plan that optimizes expected revenue minus traveling
expenses. The model was later refined by adding managerial judgments about the
(relative) effectiveness of each sales representative for each account.
Lodish et al. ( 1988) developed a model based on estimates of response functions,
made by a team of managers in a Delphi-like setting. We discuss the Delphi method
in Section 16.9. The response functions describe two situations:
In an application, additional archival data were collected, including cost per sales
representative, other expenses, management time, production and distribution costs,
and the current allocation of sales force effort. The model results include the optimal
size and allocation of a company's sales force.
Rangaswamy et al. ( 1990) proposed a normative sales force model. In an applica-
tion for a pharmaceutical firm, they used a combination of judgmental data (obtained
with the Delphi process) and historical data to calibrate several thousands of parame-
ters. Their model generates the optimal number of sales force teams, their sizes, and
the deployment of the total sales force by product and market segment. In 1990, this
model-based approach for sales-force structuring had been implemented in over 100
settings in more than 20 countries.
Gopalakrishna and Chatterjee (1992) developed a model that explains the realized
share of sales potential by advertising expenditures, personal selling expenditures,
and competitive variables. The model can be used to assess the joint impact of adver-
tising and personal selling effort on performance. An application at a US industrial
firm suggests that this approach can raise profits substantially.
Other normative models have been constructed to specify salesforce compensa-
tion plans. These plans determine fixed payments (salary) and variable payments
(commissions, bonuses). Optimal compensation plans can be constructed by maxi-
mizing the expected profit. If sales are assumed to be stochastic, it can be demon-
strated that the optimal compensation plan is a function of the risk tolerance of
the salesperson, the environmental uncertainty, the efficiency of the production pro-
cess, and alternative job opportunities (Basu, Lal, Srinivasan, Staelin, 1985). Lal and
Staelin ( 1986) modified the model in such a way that it can be applied in situations in
which the sales force has more knowledge of prospects and clients than management
does (asymmetric information) and the sales force is considered to be heterogeneous
with respect to risk tolerance and ability. Lal and Staelin demonstrate that the pres-
ence of asymmetric information and heterogeneous salespeople leads to an optimal
strategy of offering multiple contracts. In this situation, different salespeople choose
different schemes leading to improved individual performance and organizational
profit.
Neslin and Shoemaker (1983) devised a normative decision calculus model for
planning coupon promotions. Dhar, Morrison and Raju (1996) apply a normative
model to study the relative impact of package coupons on profits. For examples of
normative models to determine optimal prices see Hanssens et al. (1990, pp. 240-
246), Zoltners (1981), Narasimhan (1988) and Rao (1993).
Zoltners ( 1981) distinguished normative models as: theoretical models and decision
models. 34 Theoretical models are designed to develop normative theory, whereas
decision models are designed to provide solutions to specific decision problems.
The decision models have a real-world focus and have an empirical basis. The mod-
els we discussed so far belong to the set of decision models. Normative theoretical
models typically employ mathematical representations of market behavior. They are
generally solved analytically or embedded in a simulation analysis.
An important subset of "theoretical, non-empirical, normative models" consists
of dynamic optimal control models. Most of these models have been developed to
determine optimal advertising expenditures over time, subject to dynamics that de-
fine how advertising expenditures translate into sales and in turn, into profits for a
firm or even for a group of firms. 35 Optimal control theory has also been applied
to find optimal pricing policies over time. 36 Rao and Thomas (1973) used dynamic
programming to schedule promotions optimally over a time horizon.
Theoretical normative models have also been developed that model "normative"
purchase behavior. Examples are the models of Krishna ( 1992, 1994) and Assunc;:ao,
Meyer (1993). In these models the impact of consumer price expectations, dealing
patterns and price promotions on consumption have been examined.
The marketing literature contains several allocation models. These have the following
characteristics. Resources are available in limited quantities; for example, an adver-
tising manager has a budget, a sales person can work eight hours a day, potentially
supplemented by a few of hours overtime. The purpose of such models is to allocate
this quantity to subvariables (media, market segments, sales accounts ...) so as to
optimize an objective function (profit, sales, ...). We consider a few examples in the
areas of advertising, selling, and distribution.
Blattberg and Neslin (1990, p. 391) suggest that the promotion planning process
consists of three levels of budgeting decisions: the total marketing budget, the allo-
cation of that budget to promotions (versus advertising and other marketing mix el-
ements), and the preparation of individual promotion budgets or "individual events".
The allocation of the total marketing budget over promotions, advertising and other
marketing mix elements can be accomplished by the normative models discussed
in Section 8.3.1.37 A model which allocates the promotion budget over advertising
and trade promotion expenditures was developed by Neslin, Powell and Schneider
Stone (1995). This model represents the manufacturer's attempt to maximize profits
by advertising directly to consumers and offering periodic discounts to the retailer in
the hope that the retailer will in tum "pass through" a promotion to the consumer. The
model considers the allocation in a manner that appears to fit the last two levels of
budgeting decisions defined by Blattberg and Neslin.
The best-known allocation models are the media allocation models. Some of
these only consider the allocation of a given advertising budget to a number of al-
ternative media vehicles. Examples are Lee and Burkart (1960), Lee (1962), and
Ellis ( 1966). Others have modeled the timing of the insertion in the various media.
Examples are, Lee (1963), Taylor (1963), Little and Lodish ( 1969), Srinivasan ( 1976),
Mahajan, Muller (1986), Hahn, Hyun (1991), Feinberg (1992), Mesak (1992), Bron-
nenberg (1998), Naik, Mantrala, Sawyer (1998). Reddy, Aronson and Starn (1998)
developed the SPOT (Scheduling Programs Optimally to Television)-model. This
model is used for optimal prime-time TV program scheduling. Because the adverti-
sing revenues of TV-networks are linked directly to the size of the audience delivered
to the advertiser, this type of scheduling model is also relevant for decision makers
in marketing. The issue of advertising schedules, specifically whether advertising
should be steady (constant) or turned on and off (pulsed), has received attention from
authors of the more recent studies. Objective functions in media allocation models
vary from the maximization of reach to maximization of a discounted profit stream
over a finite time horizon. Pedrick and Zufryden ( 1991) developed a model to analyze
the impact of advertising media plans and point-of-purchase marketing variables on
several brand market performance measures (market share, penetration, and depth of
repeat purchase patterns).
An advertising budget can be allocated to subvariables other than media vehicles
as well. The subvariables could be market segments such as, for example, different
geographic regions. Applications include Zentler and Ryde ( 1956), Friedman ( 1958),
and Urban (1971). Alternatively an advertising budget can be allocated to different
products. This is considered by Doyle and Saunders (1990) and discussed in Chapter
13.
The allocation of sales effort has also been the subject of numerous studies.
37. It is also conceivable, of course, to have both budget detennination and allocation in one single model. An
example is the "integrated model for sales force structuring" developed by Rangaswamy et a!. ( 1990).
INTENDED USE 153
Nordin (1943), Zoltners (1976), Zoltners, Sinha (1983), Skiera and Albers (1998)
examined the spatial allocation of a sales force. Brown, Hulswit, and Ketelle ( 1956)
studied the optimal frequency of visiting actual and potential buyers. Andre ( 1971)
and Lodish (1971) developed procedures to optimize a salesmen's allocation of time
spent on different accounts. Montgomery, Silk and Zaragoza (1971) present a proce-
dure to help a salesperson in determining how much time to spend on various products
to be sold.
The allocation of the promotion budget to individual events deserves more atten-
tion. Relevant to this question is the empirical result that the frequency and magnitude
of price discounts have significant effects on the (own)price elasticities. 38 Higher
and more frequent discounts lead to less negative price elasticities. Thus the timing
and the determination of the size of the discount are important determinants of the
managers' profit optimization problem. 39 However, the allocation of the total amount
to discounts in specific time periods and the magnitude of each discount remain
important optimization questions.
We note that the allocation of shelf space requires the development of idiosyn-
cratic models. We discuss such models in Chapter 13.
38. See, for example, Raju (1992), Foekens, Leeflang, Wittink (1999).
39. See also Tellis and Zufryden (1995)
154 CHAPTERS
aq P . 1 ..
l7 P =- - = pnce
ap q
e ast1c1ty.
ac 1
p-c-q---- 0
aq aq;aa
or
p-MC =
aq;aa
After dividing both sides by p, we find:
=
aa = margma revenue o pro uct a vertlsmg.
f1- paq
- · 1 f d d · ·
ac ac;a:x
p-c-q--q-- = 0
aq aq;a:x
or
p-MC q ac;a:x
p P aq;a:x
or
p
YJx- = 1/w (8.A.9)
c
where
caq;ax)/q
l1x
(8cj8i)jc
At optimality (8.A. 7), (8.A.8), and (8.A.9) should hold simultaneously, or:
p 1
-YJp = !1- = YJx-;: = --;:;;· (8.A.l0)
This result is generally known as the Dorfinan-Steiner (1954) theorem. This theo-
rem has been modified and extended in many directions. Examples are the models
of Lambin (1970), Lambin, Naert, Bultez ( 1975) (see Chapter 11 ), Leefl.ang, Reuyl
( 1985b), Plat, Leefl.ang ( 1988).
CHAPTER 9
Aggregate demand models can either describe market behavior directly, or indirectly
through individual behavior models from which outcomes are aggregated to deter-
mine market response. 2 An aggregate response model, if postulated directly, is ap-
plied to aggregate data and has its own component of response uncertainty (repre-
sented by the properties of the error term). The probabilistic properties of the indi-
rectly specified aggregate models are derived from the perspectives of the individual
component models.
I. The literature contains a number of synonyms for product class sales, such as primary demand (see, for
example, Leeflang, 1977a), industry sales or demand (see, for example, Lambin, Naert, Bultez, 1975), and
generic demand (Hughes, 1973, p. 2). In analogy to product class sales being called primary demand, brand
sales is also referred to as secondary demand.
2. SeeLilien, Kotler, Moorthy (1992, p. 672).
157
158 CHAPTER9
The increasing availability of scanner-type data at the household level has led
to strong interest in models of individual buyer behavio~ and methods for aggre-
gation. The unique opportunities for understanding consumer behavior and deriving
implications for marketing actions include: 4
The modeling of aggregate response from the addition of behavior across individuals
ideally reflects heterogeneity across households in their intrinsic brand preferences
and in their sensitivities to marketing variables (Chintagunta, Jain, Vilcassim, 1991).
Different approaches are available to account for household heterogeneity at the dis-
aggregate level. 7 These approaches exploit the information contained in the set of
purchases available from each individual or household. The question is how to accom-
modate differences across households in their brand preferences and sensitivities to
marketing variables in models of aggregate data. Related aggregation questions apply
to the aggregation of store data to the market level. 8 We consider the aggregation
problem in Chapter 14.
Some models have been developed which have about the same structure at the
individual (micro) level and at the aggregate (macro) level. For example, Markov
models are often used to accommodate the idea that the last brand chosen (in period
t) affects the current purchase (in period t + 1). A first-order Markov model applies
when only the last purchase has an influence on the next one, i.e.
when the current (and future) purchasing behavior does not in any way depend on past
purchases. A first-order Markov model applies when only the most recent purchase
has an influence on the current one. In a stationary first-order Markov model the
transition probabilities are constant over time: Pijt = Pij. The transition probabilities
are related to the (individual) brand choice probabilities (nj,r+I) as:
n
lrj,t+i = L Prjlrrt. for every j = 1, ... , n, t = 1, ... , T (9.2)
r=i
where
1rj,t+ 1 = probability brand j is chosen at t + 1 by an individual
consumer/household,
n = total number of brands.
The unconditional (7r71 ) and conditional (Prj) probabilities are distributed over the
population of consumers. One may account for heterogeneity by making assump-
tions about these distributions. 9 Under the assumption of consumer homogeneity, i.e.
consumers have the same Prj and Jr71 values, it can be demonstrated that relation (9.3)
holds at the aggregate level:
n
mj,t+i = L Prjmrr. for every j = 1, ... , n, t = 1, ... , T (9.3)
r=i
where
m j,t+ 1 market share of brand j in period t + 1,
Prj = fraction of consumers who buy brand r in t and brand j
int+l.
The fractions m j,t+i, m jr, Prj follow the multinomial distribution with means lrj,t+i•
lrjt and Prj· Although (9.2) and (9.3) have the same structure, the definitions of the
variables are clearly different between the individual- and the aggregate levels.
Given a specific set of alternatives, individual i chooses the option with the highest
utility. The probability of choosing j is:
If the e ji in (9.4) are independently distributed random variables with a double ex-
ponential distribution, then it can be shown 11 that individual i 's choice probabilities
have the (simple) form:
Substituting (9.7) in (9.6) we obtain the expression for the multinomiallogit model
at the individual level:
where
Lr = the number of predictor variables for alternative r.
A similar expression is available at the aggregate level, as discussed in Section 9.4. 12
From the disaggregate logit model in equation (9.8), forecasts of aggregate de-
mand and market shares can be obtained. The essence is that a prediction of the
share of choices of a brand, needs to be computed from the individual-level choice
probabilities. If the model would have been calibrated on the whole population of I
consumers, this approach would be conceptually simple: the expected market share
ofbrand j would equal the average of the choice probabilities of the I individuals in
the population:
I
mj = L:nj;/I. (9.9)
i=l
However, in most cases the individual-level choice probabilities are not known for all
individuals in the population, since the levels of the predictors are unknown outside
ll. Theil (1969), McFadden (1974).
12. Allenby and Rossi (l99la) bave demonstrated, tbat wtder a number of conditions, the micro specification
(9.8) is related to a macro specification, such as (9.27) substituted in (9.25). See also Gupta et al. (1996).
LEVEL OF DEMAND 161
the sample. However, some appropriate distribution of the predictors may be as-
sumed, for example the normal distribution. For simplicity we take the case of a single
predictor, i.e. L = 1 in (9.8), two brands, i.e. n = 2, and coefficients that are constant
across brands. Denoting the normal distribution of the predictor by ¢(x j, J.1 j. aj), an
estimate of market share can be obtained by integrating over the distribution of the
predictor:
This problem has been solved by McFadden and Reid (1975). The idea is that the
difference between the value of the predictor for a particular subject, and its mean in
the population x ji - .X j, follows a normal distribution with zero mean and variance
a 2 . Formulating (9.4) in this case as:
assuming sj; are N (0, 1) distributed, and applying equation (9.5) yields:
This equation can be shown to result in the aggregate binary probit model:
with <t> 0 the cumulative normal distribution. Thus, the aggregate share is the same
as the individual choice probability evaluated at the population mean of the predictor,
but the variance of the disturbance is increased to 1 + a 2 , so that the scale of the
aggregate model is smaller than that of the individual-level model (see Ben-Akiva
and Lerman, 1985, pp. 143-144, who also give extensions to more predictors and
choice alternatives).
The explicit integration procedure is elegant, but not without problems. One prob-
lem is that the assumption of the normal distribution may not be tenable. This holds
for example if some of the predictors take on only a limited set of discrete values
(summations would replace the integral). The second problem is that for larger num-
bers of predictors the computation of the shares lead to numerical difficulties, and one
needs to resort to simulation methods. Therefore, a number of approximate methods
have been proposed. Ben-Akiva and Lerman (1985) provide an excellent overview,
here we give two particular examples:
mjRI =liji (- - )
X]j, ... ,XLj. (9.14)
162 CHAPTER9
From equation (9.10) this procedure can be seen to neglect the heterogeneity in the
distribution of the predictors in the population: it involves an approximation of the
distribution of those predictors by their means. The difference between (9.14) and
(9.10) increases as the variances of the predictors increase.
Sample enumeration (SE) uses the random sample of N individuals from the popula-
tion as representative of that population and uses the predicted share in the sample as
an estimator of the population market share:
(9.15)
where
I = total number of consumers in the population.
The sample enumeration estimator has the attractive property that it is a consistent
estimator of the population share if the parameter estimates in the choice model are
consistently estimated. In addition, sample enumeration is an attractive procedure to
obtain forecasts of market shares in a-priori delineated market segments, defined by
e.g. geographical regions or socio-economic classes, and it is easy to use "what if'
market forecasting scenarios. Sample enumeration appears to be an attractive alter-
native of obtaining aggregate forecasts, given the potential bias of the representative
individual and the computational cost of the explicit integration methods (Ben-Akiva
and Lerman, 1985).
assumed for the intercept term, which was approximated by a discrete number of
support points and probability masses (Chintagunta, Jain and Vilcassim, 1991 ), which
involves f(aul I 8) = TCs, for s = 1, ... , S, where/(·) is a discrete distribution and
S is the total number of support points. These can be interpreted as segments (Section
17.2).
Later, heterogeneity became of fundamental interest in marketing. The support
point was extended to capture heterogeneity across all the parameters in a choice
model. Thus, finite mixture regression models arose that connected well to marketing
theories of market segmentation (Wedel and Kamakura, 1998). Such finite mixture
models have enjoyed considerable success and are discussed more extensively in
Chapter 17. Managers are comfortable with the idea of market segments, and the
models appear to do a good job of identifying useful groups. However, market seg-
ments cannot account fully for heterogeneity if the true underlying distribution of the
parameters is continuous. In addition, some practitioners, such as direct and database
marketers, prefer to work at the level of the individual respondent.
While a discrete mixing distribution leads to finite mixture models, continuous
mixing distributions lead to random coefficients (e.g. probit or logit) models. Ran-
dom coefficient logit models have received considerable attention in marketing and
related fields (Allenby, Ginter, 1995, Rossi, McCulloch, Allenby, 1996, Elrod, Keane,
1995, Haaijer, Wedel, Vriens and Wansbeek, 1998). Typically a multivariate normal
distribution is assumed for all regression parameters in the model, i.e. f(a(i) I 8) =
MV N(p.., :E). The use of a continuous heterogeneity distribution has several advan-
tages: it characterizes the tails of the heterogeneity distribution better and it predicts
individual behavior more accurately than finite mixture models.
Some researchers have argued that the assumption in finite mixture models of a
limited number of segments of individuals who are perfectly homogeneous within
segments is too restrictive (Allenby and Rossi, 1999), and that the finite mixture
model leads to an artificial partition of the continuous distribution into homogeneous
segments. In many fields within marketing, emphasis is now on individual customer
contact and direct marketing approaches. Individual-level response parameters may
be required for optimal implementation of direct- and micro marketing strategies. On
the other hand, proponents of the finite mixture approach argue that the estimates
of models with continuous heterogeneity distributions may be sensitive to the spe-
cific distribution assumed for the parameters (i.e. the normal), which is determined
subjectively by the researcher. Further, most models that approximate heterogeneity
through a number of unobserved segments have great managerial appeal: models for-
mulated at the segment-level have an edge if there are scale advantages in production,
distribution, or advertising. Still, combinations of the discrete- and continuous het-
erogeneity approaches, that account for both discrete segments and within-segment
heterogeneity, have been developed, (Allenby, Arora and Ginter, 1998, Allenby and
Rossi, 1999).
164 CHAPTER9
Observations used for the estimation of product class sales can be of two varieties:
1. Cross-sectional data: product class sales and other data are available across, say,
geographical areas or individuals at a certain point in time.
2. Time-series data: the values of product class sales for a given region or individual,
and the corresponding explanatory variables, are observed over a number of time
periods.
The observations may also consist of a combination, or - as it is generally referred
to in the literature - a pooling of cross-section and time-series data. Pooling calls for
special estimation procedures which are introduced in Chapters 14 and 16.
If product class sales is to be explained from cross-sectional data, the explana-
tory variables are often socio-economic and demographic variables, such as age, sex,
education, occupation, income, location, family size, as well as marketing variables.
However, the effects of marketing instruments cannot be obtained unless the market-
ing instruments vary across individuals, or groups of individuals, or cross sections.
Many examples of these models can be found in the economic and econometric
literature. See, e.g. Duesenberry (1949), Klein, Lansing (1955), Kapteyn, Wansbeek,
Buyze (1980), Kapteyn, van de Geer, van de Stadt, Wansbeek (1997).
An example of a marketing model estimated with cross-sectional data is a model
that captures differences in trade show effectiveness across industries, companies and
countries. The authors of this model (Dekimpe, Franc;:ois, Gopalakrishna, Lilien, Van-
den Bulte, 1997) provide results about the effects of various show types and tactical
variables (booth size, personnel) on observed performance.
In time-series data, the predictors often consist of both environmental and mar-
keting variables. Examples of environmental variables are: population size, a weather
index, and information on economic activity. For cross-sectional data it is common
for researchers to use aggregate values of marketing instruments, meaning that ag-
gregation is performed over all brands that constitute the product class. Examples are
total advertising expenditures, total number of retail outlets and average price.
In Section 8.3.1 we presented a product class sales model estimated from times-
series data. In that example, however, product class sales and brand sales were iden-
tical, given that the market under consideration was monopolistic. Below we give
some examples of product class sales models for oligopolistic markets. The purpose
of these examples is to illustrate the types of variables that are used to explain product
class sales.
Demand for gasoline is a derived demand in the sense that it depends on the
number of automobiles. Thus, car ownership may be an explanatory variable. An
example is Lambin's (1972a) estimation of per capita demand for gasoline in Italy, as
a function of per capita car ownership.
] -0.39
-
~;
PA
= 94 [ Nrt J0.77 [ ln "n
L..;;tl Prt [sdJ]-0.06[sdz]0.04[sd3]o 09 (9.16)
LEVEL OF DEMAND 165
where
A similar case is Leeflang 's ( 1976, 1977a) study of the Dutch detergent market, where
sales depend on the ownership of automatic washing machines, 13
Qr = e7.51 [ ~~:
*
J
0.40 [ I "n
n L..J;;,l Prt
] -0.23 [AWr-d·ll (9.17)
n
. [ L:a,,
e ] -0.02 [ n
. Lart
p] 0.02
r=l r=l
where
In Section 6.1 we briefly discussed the relation between industry sales and variables
such as industry advertising expenditures in the West-German cigarette market. One
of the non-dynamic relations is relation (9.18):
0.85
All variables are defined per capita (per person over 15 years of age). After all vari-
ables were regressed against time (to remove trends), the estimated parameters for
the (adjusted) per capita variables in (9.18), obtained from monthly observations, are
those in Table 9.1. Advertising has a significant effect on industry sales. In an other
part of their analysis (not shown here) Leefiang and Reuyl demonstrate that this ef-
fect diminishes over time. The estimated coefficient for household consumption (per
capita) indicates that the consumption of cigarettes is quite responsive to household
consumption, albeit that a percentage increase in household consumption leads to a
smaller percentage increase in cigarette consumption. The cross elasticities are all
negative. The cross elasticy for roll your own tabacco is the only significant cross
elasticy.
Many other examples of industry sales demand models can be found in the liter-
ature. See e.g. Lambin (1976), Leone, Schultz (1980), Lancaster (1984). 14
14. Calls for advertising bans in different areas such as alcohol and cigarettes continue to echo around the
world on a continuing basis. This explains why so many models have been developed in these areas. See, e.g.
DuffY (1996), Franses ( 1991 ), Leeflang, Reuyl ( 1995) and Luik, Waterson ( 1996).
LEVEL OF DEMAND 167
Recall that brand sales can either be modeled directly or indirectly. Directly means
that sales of brand j are explained as a function of marketing variables of brand
j, marketing variables of competing brands, and environmental variables. Indirectly
means that brand sales (qj 1 ) obtains from the product of category sales (Q 1) and
market share of the brand of interest (m jt). Specification of product class sales was
discussed in the previous section. We discuss market share models in Section 9.4.
I. To the extent that marketing activities for individual brands influence product
category sales, it is implausible that those marketing effects are the same for
equivalent increases across the brands that belong to a product category.
2. Product category sales result from the aggregation of sales across brands be-
longing to the category. Since brands are heterogeneous in marketing activities
15. See Clements, Selvanathan (1988), Vilcassim (1989).
16. The assumption being that such variables affect demand for each brand equally. This assumption will often
be quite reasonable. If not, however, environmental variables affecting brands differently should be included in
the market share function (as well as in the direct estimation of brand sales).
17. This is emphasized by, for example, MacLachlan (1972, p. 378) and Beckwith (1972, p. 171).
168 CHAPTER9
and tend to have unique parameters relating marketing variables to sales, the
interpretation of product category demand model parameters is unclear.
3. If product category sales fluctuates, then a given brand's market share values are
not really comparable. Alternatively, the more product category sales fluctuates,
the more questionable the assumption of constant parameter values is.
We illustrate the direct approach with the well-known SCAN*PRO model (Wittink,
Addona, Hawkes, Porter, 1988). This model uses brand sales as the criterion variable.
The SCAN*PRO model is a store-level model developed to quantify the effects of
promotional activities implemented by retailers on a brand's unit sales. The model
accommodates temporary price cuts, displays, and feature advertising. In addition, it
includes weekly indicator variables to account for the effects of seasonality and miss-
ing variables (such as manufacturer television advertising and coupon distributions)
common to the stores in a metropolitan area, and store indicator variables. This model
has been used in over 2000 different commercial applications in the United States, in
Canada, in Europe, and elsewhere.
A slight modification of the original model is specified as follows, for brand j,
j = 1, .. . ,n:
(9.19)
This model has been numerically specified in a study of the model's forecasting
accuracy at different levels of aggregation (store, chain, market-level). We consider
here the parameter estimates obtained from store-level data only. These estimates are
obtained from UPC scanner data provided by ACNielsen, for one large metropolitan
area in the United States. Weekly data were available for three national brands com-
peting in a frequently purchased food category. The average values of the significant
parameter estimates are shown in Table 9.2. The averages are averages over three
brands and the 40 stores in the sample. 18
As expected, the own-brand price elasticity is negative, and the cross-brand elas-
ticity is positive. The (promotion) multipliers with a value larger than 1 have a positive
effect on unit sales, while values smaller than l have a negative effect. All cross
effects, except feature, have the expected negative impact on qkjt·
The SCAN*PRO model has been used in several studies in which aggregation
effects are considered. 19 The model also constitutes the basis for the development of
varying parametermodels20 (see Section 17.5), semiparametric models 21 (see Section
16. 7) and models for the effects of dynamic lead- and lag effects. 22
To model brand sales indirectly, we closely follow the derivation in Lambin, Naert
and Bultez (1975). They observed that for profit maximization the Dorfman-Steiner
theorem (derived in the Appendix to Chapter 8) remains valid independent of whether
the market is a monopoly or an oligopoly, making a separate derivation for each
case unnecessary. 23 For an oligopoly, however, brand sales elasticities can be de-
composed.24 We illustrate this below for the brand sales advertising elasticity. 25 We
formulate the relation between brand sales-, product class sales-, and market share
elasticities as follows. By definition, brand sales, q, equals product class sales, Q,
times market share, m,26
q = Qm. (9.20)
or
i.e., the brand sales elasticity with respect to advertising, TJq,a, is equal to the total
product class sales elasticity, TJ Q,a, plus the market share elasticity, TJm,a, with respect
to the same variable. Thus the brand sales elasticity can be obtained indirectly from
the sum of the product category sales- and the brand market share elasticities.
Relation (9.24) can be extended to account for competitive reactions. These reac-
tions with the same (advertising) or other marketing instruments (for example, price)
are called indirect effects. They may influence TJQ,a and/or TJm,a as will be discussed
in Chapter 11.
Market share models can be specified to be logically consistent, in the sense that
predicted values satisfy range (being between zero and one) and sum (summing to
one across brands) constraints. One class of models that satisfy these constraints are
the attraction models. The attraction of a brand depends on its marketing mix. Let
A jr be the attraction of brand j in period t (note that this symbol is also used for total
advertising expenditures). Market share attraction models are defined as:
Ajr
m jt = ='""'=-n:::--""--- (9.25)
L...r=l Art
1. The attraction for each brand is non-negative, i.e. A jt 2: 0 for j = 1, ... , n and
t = 1, ... , T, and total attraction exerted on the market is positive,
L Art > 0,
ll
t = 1, ... , T.
r=l
The attraction model has a structure that logically follows from a number of plausible
axioms. 27 Only axiom (4) needs to be justified. Essentially, axiom (4) implies that if
one brand becomes more attractive, while other brands remain at the same level of
attractiveness, its gain in market share will come from all other brands proportional
to their current shares (and to their attractiveness levels). This axiom is analogous to
the IIA assumption discussed later.
Axiom (4) does not imply that the effect of a change in a marketing variable
on one brand's market share has equivalent effects on other brands' market shares.
The reason is that the brands' attractions depend on the marketing variables, and
these attraction functions can be specified in a variety of ways. 28 Thus, axiom (4)
deals with how the market shares of (other) brands vary when the attractiveness of
one brand changes. It does not say anything about how the brands' attraction values
depend on marketing activities, i.e. the components of the attraction functions are not
discussed in the axioms.
Equation (9.25) represents the overall structure. The attraction function itself
remains to be specified. We present six different specifications. In Section 14.3 we
discuss other specifications. Two well-known market share specifications are the MCI
model and the MNL model. The attraction for brand j in the "Multiplicative Com-
petitive Interaction" (MCI) model is specified as: 29
n
L
Ajr = exp(aj) xfJ,. 8jt (9.26)
i=l
where
Xijt = the value of the i-th explanatory variable for brand j,
in period t,
8jt = a disturbance term,
L the number of marketing instruments.
Throughout it is assumed that L is independent of j.
The attraction for the MultiNomial Logit (MNL) market share model is specified
as:
While the attraction specification (9.26) has attractive characteristics, there are
also two disadvantages. First, the attraction is zero if one of the explanatory variables
(for example, advertising) is zero in period t. This problem does not apply to the
MNL model (9.27). Second, the response parameter for instrument l is f3e, which is
assumed to be equal for each brand. As we mentioned about the lack of robustness for
the linear model in Section 7.3, marketing executives, in general, find the assumption
of equal response parameters across brands unacceptable.
An extension that allows a variable's response to vary across brands is the ex-
tended attraction model (versus the simple attraction model (9.26) and (9.27)). This
model is also known as the differential effects model: 31
(9.28)
Before we introduce the last two of the six attraction specifications, we present ex-
pressions for elasticities.
OWN-BRAND ELASTICITIES
Market share elasticities need to be determined separately for each attraction model.
That is, the formula depends on the attraction specification. The direct or own market
share elasticities (e])for models (9.26)-(9.29) are (ignoring time subscript t):
8m · xr
MCI e~
J = - 1 · __.!_ = f3e(l- mj)
8xej mj
MNL e~
J = f3e(l- mj)Xtj (9.30)
MCI-DE e~ f3ej(l- mj)
J
MNL-DE e~
J = f3ej(l- mj)Xtj·
Note that the four elasticities differ with regard to the homogeneity/heterogeneity
of the marketing variable parameters and in the presence/absence of the marketing
variable itself. Specifically, the DE versions have f3tj (versus f3e), indicating that
the parameter is brand-specific (heterogeneous). And the MNL model expressions
include Xtj. Apart from these distinctions, each elasticity expression includes a mar-
keting effort responsiveness parameter and the share of the market not captured by
31. Cooper, Nakanishi ( 1988, Chapters 3 and 5).
174 CHAPTER9
the brand ( 1 - m j). Thus, even if the responsiveness parameters are homogeneous,
the elasticities differ across brands according to the remaining share. The inclusion
of this ( 1 - m j) term has the desirable property that the elasticity goes toward zero
as the brand's market share goes to one.
The MNL-based elasticities differ from the corresponding MCI-based ones in the
inclusion of Xfj, which measures the marketing effort for variable f used by brand j.
Holding market share constant, the elasticity expression shows that an increase in
marketing effort, for f3 > 0, increases the elasticity. However, we know that market
share is affected by marketing activities. Also, it is generally accepted that it becomes
harder to gain share as the marketing effort increases. 32 The MNL-based elasticity
expression implies that if own-brand market share increases proportionally faster
than the marketing effort, the own-brand market share elasticity will decrease with
increasing Xfj. Cooper and Nakanishi ( 1988, p. 35) find that MNL-based elasticities
increase to a point and then decrease.
CROSS-BRAND ELASTICITIES
We now turn to a discussion about cross elasticities, which are defined as:
The expressions for the cross elasticities of the four attraction models are: 33
MCI f -f3emr
ej,r
MNL e
ej,r = -f3exermr
(9.32)
MCI-DE e
ej,r = -f3ermr
MNL-DE f
ej,r = -f3erXermr.
The four cross-elasticity expressions have properties that are similar to the own-brand
elasticities. The effect of r's activity on brand j's attraction is either homogeneous
( -f3e) or heterogeneous (- f3er ), and MCI and MNL differ with regard to the exclu-
sion (MCI) or inclusion (MNL) of the effort for instrument f by brand r. Also, all
expressions include mr, implying that the cross elasticity is more negative (stronger)
as r 's market share is larger. The cross elasticity does not depend on the share of
the brand for which the marketing effort takes place. However, the actual change in
a brand's share varies, reflecting its current level. The new share of any brand other
32. We assume f3e, fltj 2: 0 for all e, j which applies to variables such as distribution, selling effort, adverti-
sing, and sales promotions. For variables such as price for which fle. fltj ::; 0 an analogous reasoning can be
formulated.
33. We closely follow Cooper, Nakanishi (1988) and Cooper (1993).
LEVEL OF DEMAND 175
than r may be simply calculated by: new share of brand j = ( 1- new share of brand
r) *old share ofbrand j, j i= r.
Because the expressions (9.32) are independent of m j, the effects of market-
ing variable Xer are distributed among its competitive brands in proportion to their
market shares. This means that the competitive brands are equally substitutable. The
models (9.26)-(9.29) constrain the competition to being symmetric. This symmetry is
the result of the Independence of Irrelevant Alternatives (IIA) assumption. 34 This
assumption implies that the ratio of two (market) shares does not depend on the
presence or absence of other choice alternatives. That this assumption holds for the
models (9.26)-(9.29) can be demonstrated easily by taking the ratio of the market
share attractions of two brands j and r. These ratios are independent of the other
brands r', r' i= j, r. The riA-properties also hold for the individual choice models
such as, for example, (9.8).
An equality of cross elasticities, symmetry between brands, does not fit what we
tend to observe in the market place. Brands belonging to a given product category
are usually not equally substitutable. For example, Blattberg and Wisniewski (1989)
found that consumers who normally purchase brands with low regular prices (e.g.
store brands) are sensitive to temporary price cuts for (national) brands with high
regular prices. On the other hand, consumers normally purchasing these national
brands tend to be insensitive to temporary price cuts for store brands. Steenkamp
and Dekimpe ( 1997), however, show that the power of store brands depends on the
product category.
We discuss alternative ways to account for asymmetric competition in Sections
14.4 and 14.5. 35 Briefly, one possibility is to expand the differential effects models
one step further. Relations (9.33) and (9.34) are (the numerators of) attraction models
with differential cross-competitive effects, called Fully Extended Attraction (FEA)
models. We consider two versions of FEA models.
f=l r=l
Bjt· (9.33)
34. See Luce (1959), Debreu (1960), Ben-Akiva, Lerman (1985), Allenby, Rossi (199lb), Sethuraman,
Srinivasan and Kim (1999) and the discussion in Section 12.3.3.
35. See also Foekens ( 1995), Bronnenberg, Wathieu ( 1996), Cooper, Klapper, Inoue ( 1996), Foekens, Leeflang,
Wittink ( 1997).
176 CHAPTER9
For the own- and cross elasticities the following expressions can be derived:
36. For a more lhorough discussion see Cooper, Nakanishi (1988, pp. 62-65).
37. We closely follow Foekens (1995, pp. 168-169).
LEVEL OF DEMAND 177
n
a *L:a,,
r=l
n
St *LE:rt·
r=l
If we subtract (9.37) from (9.36) we obtain the following form, which is linear in the
parameters:
L n L
!Xj- a+ Lf3£X£jt-
f=l
*L Lf3£X£rt + Sjt- St,
r=lf=l
(9.38)
aj + t
l=l
f3e (xejt - *t r=l
xert) + sj1 •
n n L
L(d,- *)a,+ L L(d,- *)xerrf3e + sj1 (9.39)
r=2 r=lf=l
where
d, = 1 if r = j, and 0 otherwise.
n T L
logmrr ao + L d,a, + L Dr'E>t' + L f3exert + Srt (9.40)
r=2 t'=2 l=l
where
ao overall intercept,
Dr' = 1 if t' = t and 0 otherwise.
38. See Nakanishi, Cooper (1982).
178 CHAPTER9
Equation (9.40) has the same structure as (9.38) where 8r' + ao is interpreted as an
estimate for the logarithm of the denominator of (9.25) for period t'. Equation (9.40)
involves T additional parameters ao and 8 1,, t' = 2, ... , T. 39 •40
39. This has consequences for the degrees of freedom and the estimated standard errors. See Foekens (1995, p.
169).
40. We do not discuss the assumptions of the disturbances nor the estimation techniques required to estimate
these relations. For details, see Bultez, Naert ( 1975).
CHAPTER 10
As we indicated in Section 4.3, model detail can be considered from different angles.
In this chapter we concern ourselves with the amount of behavioral detail a model
contains. The "amount of behavioral detail" is not easily measured. Therefore, we
use qualitative descriptions for a comparison of models in terms of behavioral detail.
A higher level of behavioral detail usually implies that the model contains a larger
number of variables, especially intervening variables (see Figure 10.1 ). It also tends
to mean more equations and more parameters to estimate.
We distinguish three categories:
1. models with no behavioral detail;
2. models with some behavioral detail;
3. models with a substantial amount of behavioral detail.
The demarcation between "some" and "a substantial amount" is subjective. We pro-
vide illustrations to suggest typical examples of the three categories. Distinguishing
models by the "amount of behavioral detail" is important because the categorization
is a major determinant of ease (or difficulty) of estimation and validation (Chapters
16, 17 and 18).
In Section 10.1, we define what is meant by models without behavioral detail.
Many of the examples discussed in previous chapters fit into this category. We offer
some new-product evaluation models as examples of models with some behavioral
detail in Section 10.2.
In Section 10.3, we illustrate the case of a substantial amount of behavioral detail
with a Structural Equation Model (SEM). We conclude this chapter by indicating
differences in (ease of) estimation and validation based on the level of behavioral
detail. We also argue that cost-benefit considerations should play a role in the model-
building process with respect to the appropriate level of detail.
179
180 CHAPTERIO
l""
Input Output
(Stimuli) (Response measure)
By "behavioral detail" in a marketing model we mean that the model shows how
marketing variables (e.g. advertising) influence variables such as brand awareness,
brand knowledge, brand image, attitude toward a brand, motivation to purchase and
similar psychological variables. These intervening variables in tum influence pur-
chase behavior. In models with "no behavioral detail", the behavioral process, i.e. the
process by which stimuli relate to intervening variables, and the process by which
intervening and often unobservable variables relate to behavior, is not made explicit
but is treated as a black box. Thus, models with no behavioral detail show only how
response measures of ultimate behavior, such as sales and market share, relate to mar-
keting decision variables (for a firm and its competitors) and enviromnental variables.
We show this idea in Figure 10.1. 1
We note that the illustrations in Chapter 9 fit the definition of a model with no behav-
ioral detail. This is also true for many of the examples given in earlier chapters, such
as the optimal advertising budget model in Section 8.3.1. We therefore do not provide
other examples of this class of models.
represent the percentages of people moving from one state (brand choice in t) to
another (brand choice in t + 1). These transition frequencies are linked to the
ultimate response measure (market share). We discuss these models briefly in
Chapter 12.
2. Diffusion models (see below).
3. Adoption models (see below).
Many new products introduced in the market place will not achieve the required
results and will therefore be withdrawn from the market within a few years of intro-
duction. Since a large part of the costs for new product development and introduction
occurs after the research and development process is completed, managers need mod-
els to minimize costs. The models should provide accurate forecasts of new-product
sales, and show how sales depend on product characteristics and marketing variables.
Such diagnostic information allows managers to make adjustments in controllable
variables, so that new-product success rates can be improved.
l. there are only two "consumer states", consumers who have adopted (i.e. made a
first purchase) and consumers who have not (yet) adopted;
2. the total number of potential adopters is fixed;
3. the purchase volume per buyer is one unit, i.e. there are no replacements or repeat
purchases and no multiple adoptions.
2. See also Lilien, Rangaswamy (1998, pp. 202-203).
182 CHAPTER 10
t* Time(t)
Adoption models describe the stages of adoption processes. 3 Adoption processes rep-
resent the sequence of stages through which consumers progress from unawareness of
an innovation to ultimate adoption. The adoption framework can be used by managers
to determine the potential viability of a new product at pre-test market and test market
stages of the new-product development process. By contrast, diffusion models are
used to describe the early sales history of a new product and to forecast the time and
the magnitude of peak sales. The differences between adoption and diffusion models
have been characterized by Mahajan and Wind ( 1986, p. 15) as follows:
"Whereas the adoption models capture richness and reality, the diffosion models
embrace simplicity and parsimony."
Thus, adoption models generally contain more behavioral detail than diffusion mod-
els do.
DIFFUSION MODELS
where
N (t) cumulative number of adopters at time t,
g(t) = the coefficient of diffusion, which is usually
formulated as a function of N (t),
dN(t) -
dt = [ao +a,N(t)][N- N(t)]. (10.4)
For ao, a, > 0, (10.4) is known as the mixed-influence model. An increase in N(t) is
modeled as the sum of two terms, each having its own interpretation. For a1 = 0, we
obtain the (single) external influence model (10.5):
The parameter ao represents the influence of a "change agent" in the diffusion pro-
cess, which may capture any influence other than that from previous adopters. In
(10.5) it is assumed that there is no interpersonal communication between consumers
in the social system. Thus, the change in N(t), dN(t) is assumed to be due to the
dt
effects of mass communications (advertising) (Mahajan, Peterson, 1985, Mahajan,
Muller, Bass, 1993).
The (single) internal influence diffusion model (10.6) is based on a contagion
paradigm that implies that diffusion occurs through interpersonal contacts:
1.0 - - - - - - - - - - - - - - - - - - - - - - - - - - ... -
Cumulative
probability
of adoption
up to timet
Introduction of
Time
Figure 10.3 Cumulative probability that a customer in the target segment will adopt the product
before t.
N(t) =
N- [ao(N- 1'fo)/(ao + a1No)] exp{-(ao + a1N)t} (10.7)
1 + [a1(N- No)/(ao + a1No)] exp{-(ao + a1N)t}
where
No the cumulative number of adopters at t' = 0.
In the external influence model (10.5) aq = 0. Substituting this value in (10.7) and
setting No = 0 gives:
N(t) = N[1- exp{-aot}] (10.8)
which is the continuous formulation of the penetration model developed by Fourt and
Woodlock (1960).
In the internal influence model (10.6), ao = 0. Substituting this value into (10.7)
gives:
- [ N- No -
N(t) = N I 1 + ~ exp{-a1Nt}
J (10.9)
Density
function: f(t) = d(F(t))
dt
Likelihood of
adoption at
timet
Time
Figure 10.4 Density jUnction: likelihood that a consumer adopts the product at t.
The first term in ( 10.11) represents adoptions due to buyers who are not influenced
in the timing of their adoption by the number of people who have already bought the
product ("innovators"). The second term represents adoptions due to buyers who are
influenced by the number of previous buyers ("imitators").
The Bass model is usually specified in relative terms: F(t) is the (cumulative)
probability that someone in the market or in the target segment will adopt the innova-
tion by timet, where F (t) approaches 1 as t gets larger. Such a function is depicted in
Figure 10.3. The derivative of F(t) is the probability density function (Figure 10.4)
which indicates the rate at which the probability of adoption is changing over time.
Relation (10.11) can also be written as:
Item Expression
- [ I - e-(p+q)t ]
Cumulative number of adopters +
N(t) = N 1 ~e-(p+q)t
_ [p(p + q)2e-(p+q)t]
Noncumulative number of adopters n(t) = N
(p + qr<P+q)t)2
Time of peak adoptions t * =-p+qln
I (P)
q
1
Number of adopters at the peak time n(t*) = -(p
4q
+ q)2
The Bass model and its revised and extended forms have been used for forecasting
innovation diffusion in retail service, industrial technology, agricultural, educational,
pharmaceutical, and consumer durable good markets. Companies that have used the
model include Eastman Kodak, RCA, IBM, Sears, and AT&T (Bass, 1986). Since the
original publication of the Bass model, research on the modeling of the diffusion of
innovations in marketing has resulted in an extensive literature. 5
In the preceding description it is assumed that individual consumer response and
aggregate response have identical functions. Since the decision to adopt an innova-
tion is individual-specific, all potential adopters cannot have the same probability
of adopting the product in a given time period. Thus, attempts have been made to
develop diffusion models by specifying adoption decisions at the individual level. 6
Some of the extended models incorporate price, advertising or firm size (Noote-
boom, 1989). 7 Bass, Krishnan and Jain ( 1994) developed the Generalized Bass Model
(GBM) that includes marketing decision variables. The GBM reduces to the Bass
Model (BM) as a special case, i.e., if the decision variables are irrelevant. If percent-
age changes in period-to-period values of the decision variables are approximately,
but not exactly, constant, the GBM provides approximately the same fit as the BM. On
the other hand, if the coefficients of the decision variables are statistically significantly
different from zero, the GBM provides explanations for deviations of the (actual) data
from the smooth (fitted) curve ofBM and an improved fit.
The GBM can be represented as follows:
P, [p + q F(t)]x(t) (10.15)
where
P1 = probability of adoption at time t, given no adoption prior to t, and
x(t) current marketing effort.
The current marketing effort reflects the current effect of (dynamic) marketing vari-
ables on the conditional probability of adoption at time t. The dynamic part implies
5. For reviews of this literature see Mahajan, Muller, Bass (1990, 1993) and Nijkamp (1993). Putsis,
Balasubramanian, Kaplan and Sen ( 1997) model the diffusion of sales in four product categories across ten
European countries.
6. See, for example, ChatteJjee, Eliashberg ( 1990).
7. See, for example, Kamakura, Balasubramanian (1988) and for a survey Bass, Krishnan, Jain (1994). See
also Mahajan, Muller ( 1998), Kim, Bridges, Srivastava ( 1999).
188 CHAPTER 10
Sales
4500
4000
3500 •
3000
•
2500
2000
1500
Actual
1000 • • Predicted
500
ADOPTION MODELS
Aggregate adoption models differ from diffusion models in a number of ways. For
example, adoption models usually deal with frequently purchased items although
there are also examples of these models which forecast the adoption of durables. 8 The
aggregate models are based on adoption processes conceptualized at the individual
level. In these processes one may distinguish the following steps: 9
2. interest, the stage in which the individual is stimulated to seek more detailed
information about the new product;
3. evaluation, in which the individual considers whether the new product provides
sufficient value relative to its cost;
4. trial, the stage in which the individual tries the new product;
5. adoption, the stage in which the individual decides to make full and regular use
of the new product.
The explicit specification of the fifth stage means that adoption models are particu-
larly suited as repeat purchase models. We note that the interactions between adopters
are not explicitly considered in adoption models.
The marketing literature includes a large number of adoption models, 10 which
may be due to the high costs and high risks inherent in the introduction of a new
product. Particularly important contributions were made by Urban. First for new
industrial products 11 and later for frequently purchased consumer goods, 12 Urban
created a new-product evaluation model with modular structure called SPRINTER,
which stands for Specification of PRofit with INTERaction. 13 The interdependencies
refer to the new brand and its relation to established brands in a product line of the firm
that introduces the new brand. Although the SPRINTER model has been succeeded
by other ''more implementable" models, its structure is an excellent illustration of a
macro flow adoption model.
In its simplest form 14 this new-product model contains the elements in Figure 10.6.
Three consumer states or experience classes are distinguished. The pretrial class
consists of potential triers who have no experience with the new brand. In a specific
time interval some people of the pretrial class buy the new brand and move to the
prerepeat class, which consists of potential repeaters. These "purchases" are shown
in black in Figure 10.6. When these consumers buy the new brand again, they move
to the preloyal class, the class of potentially loyal consumers of the new brand. They
stay in this class when they repurchase the brand a second time, etc.
When consumers in an experience class purchase a competitive brand instead of
a new brand, they move to the "preceding" experience class. This is denoted by the
white arrows in Figure 10.6.
The SPRINTER model was programmed in a conversational mode. In other words,
the model user interacts with the computer model via a terminal which feeds questions
from the terminal to which the user responds. An example session is given in Table
10.3. 15 This table represents the main parts of the input for a SPRINTER Mod.I
10. See for some surveys Narasimhan, Sen (1983), Mahajan, Muller, Sharma (1984), Shocker, Hall (1986),
Mahajan, Wind (1988) and Urban (1993).
II. Urban (1968).
12. See Urban (1969a, 1970) and Urban, Karash (1971).
13. Urban (1968). In another publication (Urban, 1970) the acronym SPRINTER stands for Specification of
PRofits with INTERdependencies. See also Nijkamp ( 1993, p. 135).
14. Urban (1969a) labelled this SPRINTER version "Mod.r'.
IS. This is the simplest output option ofMod.I. The user's inputs are underlined.
190 CHAPTER 10
Pretrial
Prerepeat
Pre loyal
version. We explain some points. In line 5, the user specifies the size of the target
group. This number can either be constant over time or one can allow the size to vary,
for example, due to seasonality. To accomplish this, one types ADD PERIOD, after
which one can add a time index to let the target group size vary (in the example over
36months).
The time period is usually taken as the smallest interpurchase time interval, for
example one month. The illustration shows in line 8 that in time, five percent of
the users buy every month, eight percent every two months, etc. For mature product
categories, such a distribution can be obtained from survey or household purchase
data.
The data provided by the model user (new product manager, marketing manager,
etc.) in the terminal session (Table 10.3) are inputs in the new-product model shown
in Figure 10.6.
The model predicts for each period the number ofpeople in each class and the
number ofbuyers per class. The output also contains market share information, profit
per period, cumulative discounted profit (not shown in Table 10.3). 16
8 month: J.
9. repeat after one trail (%) 70.0
10. repeat after two trails (%) 90.0
II. advertising budget (thousands of$) add period
period 1:352.0
period 2:274.0
Clearly models with (more) behavioral detail require more data inputs than models
with no behavioral detail.
Many implementable adoption models have been developed since the early eighties.
Examples are BBDO's NEWS model, 17 the LITMUS 18 and ASSESSOR. 19
ASSESSOR was initially marketed in the U.S. by Management Decision Sys-
tems, Inc. (MDS), then by Information Resources Inc. (IRI) and subsequently by
Ml A/RIC Inc. 20 In Europe ASSESSOR was introduced by NOVACTION.
As of 1997 both in Europe and in the USA, BASES accounts for the majority
of pre-test-market new-product/concept evaluations. The services offered by BASES
center around a system for consumer reactions to concepts or new products prior
to test market or market introduction. The consumer reactions, along with planned
media schedules, promotional activities and distribution, are used to predict sales
volumes for the first two years. These predicted volumes are based on the infor-
mation obtained from a sample of consumers who belong to the target market and
who agree to participate after recruitment in a shopping mall. The consumers pro-
vide responses that allow BASES to estimate trial rates, repeat sales, transaction
sizes and purchase frequencies. Importantly, due to the large number of commer-
cial studies conducted, BASES uses its database to make various adjustments. For
example, the first repeat rate, which is estimated from an after-use purchase intent, is
adjusted for product-category specific and culture specific overstatements, price/value
perceptions, intensity of liking perceptions, purchase cycle claims, etc. Thus, the
usefulness of a service such as BASES is not indicated just by the quality of the
modeling and the consumer responses. By linking actual new-product results to the
consumer responses, based on factors such as product category and target market
characteristics, one can improve the validity of market projections.
The ASSESSOR model is one of the few commercial models published in academic
joumals. 21 According to published results the success rate of new products that go
through an ASSESSOR evaluation is 66 percent, compared with a success rate of
17. Pringle, Wilson, Brody (1982).
18. Blackburn, Clancy ( 1980).
19. See Silk, Urban (1978).
20. See Urban (1993).
21. See Urban, Katz ( 1983).
BEHAVIORAL DETAIL 193
The reason for having two models is that if these two models provide similar forecasts
one can have more confidence in the forecasts. In case of differences in the forecasts,
there is an opportunity to identify what unusual characteristics are present in the
new product that require special attention. The trial-repeat model is estimated with
consumer trials of products in a specially created store environment, and follow-
up contacts to capture repeat in at-home use. The preference model is estimated
from survey responses about the new product and established brands familiar to the
respondents. Both models also use management judgment about variables such as
brand awareness and -availability.
Consumers are typically screened in shopping malls. Given agreement to partic-
ipate, they enter a testing facility. Participants complete a survey about awareness
of existing brands and the consideration set of brands for purchases in the product
category to which the new product might belong. 23 In addition, participants are asked
about the brands in the product category they have purchased in the immediate past
and their preferences for the brands in their consideration sets. The preference model
transforms the measured preferences of the participants into choice probabilities. 24
Participants who do not choose the new brand in the store laboratory receive a
small size sample of the new brand. In this manner, all participants have an opportu-
nity to try the new product. After a short period, sufficiently long for most consumers
to experience product usage, the participants are contacted at their homes. Many of
the questions are the same as those asked in the store laboratory. In addition, con-
sumers are asked about product usage and are given an opportunity to (re)purchase
the new product. Having collected trial and repeat data, it is then possible to predict
the new product's market share. Similarly, the preference data after product usage
are used to obtain a second prediction of market share. Both of these predictions are
22. Lilien, Rangaswamy ( 1998, p. 211 ).
23. For detailed descriptions, see Silk, Urban (1978), Urban (1993) or Lilien, Rangaswamy (1998, pp. 204-
211).
24. See, for example, equation (9.8).
194 CHAPTER 10
= --·rr·w (10.16)
N*
where
mj = the predicted value of the long-run market share of
the new brand j,
N (t) the cumulative number of adopters,
N* the size of the target segment,
rr the proportion of those trying the new product who
will become long-run repeat purchasers of the new
product,
w = relative usage rate, with w = 1 the average usage
rate in the market.
This model was originally formulated by Parfitt and Collins (1968). In ASSESSOR
the components N (t) IN* and rr are disaggregated in a number of macro flows:
N(t)
N*
= F ·K ·D +C · U - (F · K · D)(C · U) (10.17)
where
F long-run probability of trial assuming 100 percent
awareness and distribution,
K = long-run probability of awareness,
D = long-run probability of availability in retail outlets
(weighted distribution fraction),
C probability that a consumer receives a sample,
U probability that a consumer who receives a sample
uses the sample.
The first right-hand term in (10.17), F · K · D, is quantified by the proportion of
consumers who will be aware of the new brand (management judgment), have access
to it in a store (management judgment) and try it (laboratory store results). The second
term C · U, represents the fraction of target market consumers who receive a sample
of the new brand (management judgment) and use it (home contact). The third term
adjusts for double counting those who both purchase the new product in a store for
trial and receive a sample (since in the marketplace these two events are not mutually
exclusive).
BEHAVIORAL DETAIL 195
n: = Pon (10.18)
1- Pnn + Pon
where
Pon the estimated probability that a consumer who
purchases an Qther brand in period t, purchases the
new brand in t + 1'
Pnn = the estimated probability that a consumer purchases
the gew brand in t and in t + 1.
We show in Section 12.3.1 how (10.18) is obtained. Pon is estimated from the pro-
portion of consumers who did not "purchase" the new product in the test facility but
say in the post-usage survey that they will buy the new product at the next purchase
occasion. Pnn is the proportion of consumers who purchased the new product in the
test facility and say in the post-usage survey that they will buy the product again.
1. Micro-analytic simulation models, which are models with behavioral details mod-
eled at the individual consumer level, and for which the market response implica-
tions are explored via the technique of simulation; or
2. Micro-analytic models, where the behavioral details are also modelled at the in-
dividual consumer demand level, but where the market response implications are
determined via analytic methods. Structural Equation Models (SEM) constitute
a subset of these models. Other micro-analytic models with a substantial amount
of behavioral detail are the purchase incidence models and the purchase timing
models discussed in Chapter 12.
estimates net profitability. The model includes the actions of the manufacturer, re-
tailers, and consumers. Krisha ( 1992, 1994) simulated the effects of consumer price
expectations and retailer dealing patterns on consumer purchase behavior.
Figure 10.7 Relations between (coupon usage) behavior, intentions, attitudes, subjective norms
and past behavior.
where
171 = intentions,
1)2 behavior,
~I attitudes,
~2 = subjective norms,
~3 = past behavior,
~J, ~2 = disturbance terms.
The {hi-parameter represents the partial relation between the latent endogenous vari-
ables 171 and 1)2, while all y-parameters represent partial relations between latent
exogenous and endogenous variables. In this model it is assumed that coupon in-
tentions are determined by the attitudes toward using coupons in the supermarket,
subjective norms and past coupon usage. Coupon usage (behavior) is measured with
a questionnaire which assesses people's self-reported coupon usage during a given
week (y3). Intentions are assessed by asking subjects to express their plans to use
coupons in the same week. Two items (YI and Y2) are used to measure intentions
viz. a seven-point likely/unlikely scale and a 11-point high/low probabilistic scale.
Attitudes toward using coupons are assessed with three seven-point semantic differ-
ential scales: (XI) pleasant/unpleasant, (x2) good/bad and (x3) favorable/unfavorable.
Subjective norms are measured with two items, rated on seven-point scales, viz.
198 CHAPTER 10
If we use the model-building criteria in Chapter 7, we find that models with a sub-
stantial amount of behavioral detail
• are not simple, and
• are not robust.
Although these types of models can provide useful insights, they are given limited
attention in this book. This is because models with a substantial amount of behav-
ioral detail are more difficult to implement, based on considerations of benefits and
costs. Relative to models with no behavioral detail, an important benefit lies in the
richness of representations of real-world phenomena. On the cost side we note that
the inclusion of more behavioral detail implies:
a. models of greater complexity are more expensive to construct and use;
b. these models require more extensive forms of data collection;
c. parameter estimation is more difficult because there are alternative ways to define
intervening variables, and measurement issues need to be addressed;
d. model validation is a more involved process.
27. Examples of other SEM's with a substantial amount of behavioral detail are Bagozzi, Silk (1983), Bearden,
Teel (1983), Yi (1989), Oliver, De Sarbo (1988). See also Roberts, Lilien (1993), Bagozzi (1994b), Sharma
(1996, Chapter 14).
BEHAVIORAL DETAIL 199
We also note that richness of representation does not necessarily correlate posi-
tively with the benefits one derives from a model. This depends in part on the purpose
of the model and on the problem one is trying to address. For example, marketing
mix determination for an established brand in a mature market may be adequately
accomplished with an (aggregate) market response model (assuming an absence of
aggregation bias). Given a narrowly defined purpose it is unlikely that the addition of
behavioral detail will improve a model's usefulness. On the other hand, if the problem
is to make a GO-ON-NO decision for a new product, an aggregate flow model or a
model of individual household adoption behavior should be more appropriate. Thus,
a particular level of detail might be optimal in a cost-benefit sense for one type of
problem, and not for another. The question of which level of detail is desirable will
be easier to answer after we have studied the problem of parameterization in more
depth. We, therefore, postpone further discussion until Chapter 20.
We have to keep in mind that the preceding evaluation is made from the point
of view of the possible contributions of models to decision making in marketing.
However, as we emphasized in Section 3.3, marketing models are also developed
to discover generalizations that advance marketing knowledge. The value of many
models with a substantial amount of behavioral detail may, to a large extent, be found
in the area of theory construction. Thus, we restrict ourselves here to a discussion of
the value of certain types of models from a usefulness perspective (in terms of cost-
benefit considerations) for example, as decision aids, and not of their contribution to
the development of the marketing discipline.
CHAPTER 11
Modeling competition
201
202 CHAPTER ll
Consider the following functions for Q, product class sales, and m, a brand's market
share:
We do not use the index j for p, a and x to restrict the number of indices. We use
(9.24) to define total brand j's sales elasticity with respect to its advertising (T/q,a)
as the total product class elasticity (TJQ,a) plus the total market share elasticity with
respect to brand j's advertising ('7m,a):
These elasticity measures capture the effect of advertising for one brand on consumer
demand. However, to capture the actual impact, we need to take into account how
competitors react to changes in advertising for the brand and how this reaction mod-
ifies consumer demand. The competitive reactions belong to the set of competitor-
centered approaches. Specifically, we distinguish direct and indirect partial effects of
brand j's advertising on product class sales and brand j's market share. An indirect
partial effect captures the following. If brand j changes its advertising expenditure
level (Ll a), competitors may react by adapting their spending level (Ll ac), and ac in
tum influences Q and/or m. In this explanation, as is usually assumed in oligopoly
theory, competitors react with the same marketing instrument as the one which causes
their reactions. Thus, competitors react to a change in price for j by a change in their
prices, to a change in advertising by a response in advertising, etc. This type of reac-
tion is defined as the simple competitive reactions case. It is more realistic, however,
and consistent with the spirit of the concept of the marketing mix to accommodate
multiple competitive reactions. In the latter case, a competitor may react to a change
in price not just by changing his price, but also by changing his advertising as well as
other marketing instruments.
MODELING COMPETITION 203
We formalize the idea of having simple and multiple indirect reactions through
reaction elasticities. With the general case of multiple competitive reactions we can
write aQjaa and amjaa as follows: 3
aQ aQT aQT ape aQT aae aQT axe
-
aa
- +ape- -
=aa aa
+aae- -
aa
-
+axe- aa (11.3)
and
am amj amj ape amj aae amj axe
-aa =aa- + - - +-
ape aa
- +-
aae aa
-.
axe aa
( 11.4)
Multiplying both sides of ( 11.3) by a/ Q, we obtain the product class elasticy, 11 Q,a 4:
where
IJQr.a direct product class sales elasticity with respect to brand j's
advertising,
IJQr,uc = product class sales elasticity with respect to competitors' mar-
keting instrument Ue(= Pc· ac, or xc), and
Puc,a = reaction elasticity of competitors' instrument ue(= Pe, ae, or Xe)
with respect to brand j 's advertising expenditures. 5
With (11.2) total brand sales elasticity 1Jq,a is the sum of the components on the
right-hand sides of ( 11.5) and ( 11.6):
This formulation allows product class demand to expand since 11Qr,a is not forced
to be zero. However, no indirect effects are present. This may be an appropriate
3. In ( 11.3) and ( 11.4) aQr ;a a, am j ;a a are the direct effects and aQjiJa, iJmjiJa are the total effects.
4. To obtain (Ppc.al<~Qr.pc) multiply~ .?fa:~ by~·
5. As other elasticities, Puc.a is defined as (auclaa)(a/uc).
6. The assumptions about competitive behavior are in many cases implicit.
204 CHAPTER II
representation for a follower, i.e. a competitor who assumes that other firms do not
react to his decisions.
Lambin, Naert and Bultez (LNB) (1975) applied the concept of multiple competi-
tive reactions to the market of a low-priced consumer durable good in West Germany.
They used a multiplicative market share function, 7
(11.9)
x
where u' = ufuc, u = p, a, and m j,-! = m j,t-l· We showed in Section 5.3.2 that
exponents in multiplicative relations are elasticities. Since u' = ufuc it follows that:
(11.10)
where Uc = Pc. ac, and Xc for the three equations respectively. The estimates of the
reaction elasticities to advertising for brand j were Pac.a = 0.273, Ppe,a = 0.008,
Px,.,a = 0.023.
Brand j's sales elasticity (here, the total market share elasticity, since 11Qr,u = 0)
can now be assessed by substituting the estimated market share and reaction elastici-
ties in (11.11):
i.e., the total brand sales elasticity fiq,a is 0.124 which compares with a direct (market
share) elasticity of 0.147. Thus the net or total effect of advertising for brand j is
smaller than the direct effect.
In the LNB-model it is assumed that the market consists of a leader who uses
x
marketing instruments p, a and and a follower defined as the aggregate of the other
firms. For example, Pc = L~= 2 Pr/(n -1), p = p,, ac = L~= 2 ar, a= a,, etc.,
where n = total number of brands and" 1" indicates the leading brand.
In extended-LNB models, researchers made no distinction between leaders and
followers. Instead they consider all brands separately in what amounts to a decom-
position of competitive interactions. Leeftang and Reuyl ( 1985b) determined the in-
teractionsfor each set of two brands. For L marketing instruments xe, C = 1 ... , L
7. The time is omitted for convenience. Some of the variables in the reaction functions were specified with a
one-period lag.
MODELING COMPETITION 205
for each of n brands j = l, ... , n, the following competitive reaction equations were
estimated:
Xfjt = f[xertll'=I, ... ,L l, e' = l, ... , L, j, r = l, ... , n, j ::j: r, (ll.l3)
t = l, ... , T
where Xtjt is the value of the l-th marketing instrument of brand j in period t. (Note
x,
that with a tilde, is a specific marketing instrument, viz. quality). In this extended
model, the number of equations equals Ln (n- 1), compared with L equations for the
LNB model. One difficulty with this model is that although all brands are allowed to
compete with each other, each of the equations estimated uses predictors for only one
other brand. Thus, the parameter estimates in ( 11.13) may be biased because relevant
predictor variables are omitted.
Another example of an extended-LNB model is Hanssens' ( l980b) model: 8
Xejt = f'([xe'rtlt'=I, ... ,L. -[XejtD. t = l, ... , T. (ll.l4)
r=l, ... ,n
Equation (11.14) is more general than (11.13), in that it allows for joint decision mak-
ing when j = r, that is, the possibility that changes in one variable are accompanied
by changes in one or more other variables for a given brand. These relations between
different variables for the same brand are known as intrafirm activities.
For ( 11.14) the number of equations to be estimated is Ln. While there are fewer
equations than in (11.13), the large number of predictor variables may make the esti-
mation of(ll.l4) difficult. For example, each equation may have (Ln -I) predictors,
even if we do not consider time lags.
Gatignon ( 1984) and Plat and Leeflang ( 1988) extended Hanssens' ( 1980b) model
in several directions. Gatignon's model simultaneously measures the effects of mar-
keting mix variables on the reactions that may occur in different submarkets. The
models Plat and Leeflang (1988) developed account for differences in the reactions
of competitors operating in different market segments.
The data used to calibrate the reaction functions in the studies mentioned here
concern manufacturers' actions and reactions. In the past, researchers used monthly,
bimonthly, or quarterly data. With scanner data, researchers have new opportunities
to study competitive reactions. Yet calibrating competitive reaction functions with
weekly scanner data collected at the retail/eve! also faces new problems. For ex-
ample, changes in marketing activities may reflect actions and reactions of retailers
as well as manufacturers. Price decisions for a brand, for example, are made ulti-
mately by the retailers (Kim, Staelin, 1999). Temporary price cuts, displays, refunds,
bonuses, etc. introduced at the retail level depend on the degree of acceptance (pass-
through) of promotional programs by retailers. Thus, especially with scanner data,
researchers who estimate competitive reaction functions should let the models reflect
the roles of both manufacturers and retailers.
e
8. In ( 11.14), the "substraction" of Xtjt means that instrument for brand j is not a predictor variable.
206 CHAPTER II
Leeflang and Wittink ( 1992) distinguish three categories of reactions: parallel move-
ments, retailer-dominated reactions and manufacturer-dominated reactions. Competi-
tion between the seven brands in the market they consider is dominated by price and
promotional programs.
Parallel movements are price or promotion fluctuations ofbrand j, j = 1, ... , n,
that parallel the price fluctuations or promotional expenditures of other brands r =
1, ... , n, j =f:. r and occur in the same time period t or with a delay of one time
period (one week). Such parallel movements may occur because of planned pro-
grams. For example, some retail chains may offer a price promotion for brand j
when other chains do so for brand r. Parallel movements consist of a positive relation
between competing brands' price variables (and between competing brands' pro-
motions) and a negative relation between one brand's price movements and another
brand's promotional expenditure changes.
Retailer-dominated reactions. If promotional activities in a product category are
frequent, a retailer may run a promotional activity for one brand in one week followed
by an activity for another brand in the next week. As a result, the price for brand
j decreases as the price for brand r, r =/:. j, increases, etc. In that case, we expect
a negative relation between price movements for competing brands (and between
promotions for competing brands) measured concurrently. Similarly a price decrease
for one brand in week t may be followed by a promotional activity for another brand
in week (t + 1), resulting in a positive relation between those variables measured with
a lag of one period in the reaction function. Retailer-dominated reactions are assumed
to occur in the short run, either simultaneously or with a maximum lag of one period
(one week).
In the longer run, within two to four weeks, retailers may react with price or
promotion changes for brand j to changes in similar variables for competing brands.
If these retailer activities are motivated by manufacturers' trade promotions, the na-
ture and frequency of such activities for competing brands may reflect competitive
reactions by manufacturers. In this case, partial relations with the same signs as those
for parallel movements are expected.
Manufacturer-dominated reactions. For the measurement of manufacturers' reac-
tions involving temporary price changes and other promotional variables, scanner
data (showing retail sales and retailers' promotional activities) can reveal these reac-
tions only if retailers coop~rate with manufacturers. This cooperation often results
from adaptive annual planning procedures which is assumed to take five to ten weeks.
We expect these partial relations to have the same signs as those for parallel move-
ments.
Leeflang and Wittink ( 1992) study the following marketing instruments: price (p),
sampling (sa), refunds (rj), bonus offers (bo) and featuring (ft) (retailer advertising).
Competitive reaction functions are estimated for each of these marketing instruments
for each brand. The criterion variables in the competitive reaction functions are ex-
pressed as changes. For price the logarithm of the ratio of prices in two successive
periods is used. This is based on the idea that price changes for brands with different
regular price levels are more comparable on a percentage than on an absolute basis.
MODELING COMPETITION 207
Other promotional activities are specified in terms of simple differences, because zero
values occur. To illustrate for the price of brand j (p jt) the following competitive
reaction function is specified:
n T*+l
ln(pjr/Pj,r-1) = Clj +L L /3jrr•ln(Pr,t-t*+IIPr,r-r*) (11.15)
r=l t*=l
r#j
T*+l
+ L f3 jjt* ln(p j,r-r*+ I/ p j,r-r•)
t*=2
n T*+l 4
+ LLL Txjrt* (Xr,t-t*+ I - Xr,t-t*) + e jt'
r=l t*=l x=l
for j = 1, ... , nand t = (T* + 2, ... , T)
where
X 1 =sa, x = 2 = r:f, x = 3 = bo, x = 4 = ft,
T* = the maximum number of time lags (T* = 10),
T = the number of observations available,
n the number of brands,
s jr a disturbance term.
Equation ( 11.15) includes lagged endogenous variables. This is done to account for
the phenomenon that periods with heavy promotions are frequently followed by peri-
ods with relatively low promotional efforts.
An inspection of ( 11.15) makes it clear that the number of predictor variables is so
large that it easily exceeds the number of observations. 9 Leeflang and Wittink ( 1992)
used bivariate causality tests to select potentially relevant predictor variables. 10
For all variable combinations, we show in Table 11.1 the number of times pairs of
brands are "causally" related, based on bivariate tests. The maximum number for each
of the cells in this table is 42 (excluding causal relations between different variables
for the same brand). The largest number, 24, pertains to price-price relations. The
smallest number, 2, refers to sampling-refund relations.
There are 256 significant relations (221 + 35) in Table 11.1, out of which 35 pertain
to relations between variables for the same brand. These intrafirm activities are indi-
cated in parentheses. Of the remaining 221 competitive reactions, 65 (29 percent) are
simple (shown on the diagonal). This observed percentage is (significantly) different
from an expected percentage of 20 percent if simple and other competitive relations
are equally likely. Yet 156 (71 percent) of the estimated competitive reaction effects
involve a different instrument. Thus, the results in Table 11.1 show that there is ample
evidence of multiple competitive reactions.
9. For example, suppose thatn = 7 (brands) each with five instruments, T* = 10 (lagged periods), and T = 76.
10. For more detail on these tests, see Chapter 18.
208 CHAPTER II
The causality test results were used to specify the competitive reaction functions.
Leeflang and Wittink obtained 171 significant reaction effects, of which some cases
represent two different categories of reactions, such as, e.g. retailer- and manufacturer-
dominated.
Table 11.2 summarizes the statistically significant effects with respect to the
categorizations, stated in terms of parallel movements, short-run retailer-dominated
reactions, long-run retailer-dominated reactions, and manufacturer-dominated reac-
tions. For each criterion variable we show the frequencies with which each of the
four types of reaction occur. The column totals show that the largest frequency of
reaction effects occurs in the manufacturer-dominated reaction category, followed by
the long-run retailer-dominated category. Similarly the row totals indicate that price
is the most frequently used reaction instrument, followed by feature advertising.
These results illustrate the potential of competitive reaction function estimation.
For a discussion of other models in which competitive reaction functions are cali-
brated, see Hanssens, Parsons and Schultz (1990, pp. 201-210) and more recently,
Kadiyali, Vilcassim, Chintagunta ( 1999) and Vilcassim, Kadiyali, Chintagunta ( 1999).
In all cases, the reaction functions attempt to capture how marketing instruments are
used to react to changes in other instruments without regard to consumer response.
Type of Reaction
Short-run Long-run
Retailer- Retailer- Manufacturer-
Criterion Parallel dominated dominated dominated
variable movement reaction reaction reaction Total
Price II 5 9 25 50
Sampling 3 1 13 29 37
Refund 4 0 7 6 17
Bonus 2 3 7 12 24
Feature 12 3 II 17 43
Total 32 12 47 80 171
metric competitive behavior means that if the attraction of one brand changes by a
given amount, the market share of any other brand is affected proportionally, no mat-
ter which brand's attraction has changed. However, competitive effects observed in
the marketplace reveal that leading brands often take market share disproportionately
from other brands (Cooper and Nakanishi, 1988, p. 56, Carpenter, Cooper, Hanssens
and Midgley 1988, Foekens, Leeflang, Wittink, 1997, Cotterill, Putsis, Dhar, 1998
and the discussion in Section 9.4). Also price or quality leaders often have strong
brand loyalties that prevent demand effects from being homogeneous across brands
(see also Blattberg and Wisniewski, 1989). These observations indicate both differ-
ential effectiveness of brands and asymmetries in market structures. The implication
is that market response functions should satisfy the condition that the pairwise share
ratios are allowed to be context dependent, that is, contingent on the presence of
other choice alternatives. The market share shifts modeled in this way are not necess-
arily proportional and may represent privileged substitution effects between choice
alternatives.
Leeflang and Wittink (1996) describe customer-focused assessments by estimat-
ing asymmetric market share response functions. Their model is calibrated with the
same data set discussed in Section 11.1 for the estimation of competitive reaction
functions (equation (11.15)). The structure of the market response functions is simi-
lar to that used for the competitive reactions. The criterion variable is the natural
logarithm of the ratio of market shares in successive periods for brand j =I, ... , n:
ln(m jr/m j,t-!) which is a function of the natural logarithm of the ratio of prices in
successive periods and the first differences of four promotional variables (refunds,
bonus activities, sampling, featuring) of all brands r = I , ... , n:
210 CHAPTER II
Market share b 2 3 4 5 6 7
ml p,ft ft sa sa
m2 p, bo,ft ft ft p p*
m3 p ft
m4 ft ft* bo,ft sa
ms p bo* p,ft sa,ft
m6 p p,bo sa rf, sa
m7 ft p p* p
a The letters are used for predictor variables that have statistically significant effects
in the multiple regression; p = price, sa = sampling, rf = refund, bo = bonus, ft =
feature defined as in (11.15). If the sign of the coefficient for the predictor in the multiple
regression is counter to expectations, the letter has the symbol * next to it.
b Market share: iii j = ln(m jt/m j,t-1).
c Own-brand effects are in the cells on the diagonal; cross-brand effects are in the off-
diagonal cells.
n T*+i
Aj +L L Yjrt•ln(Pr,t-t*+i/Pr,t-t•) (11.16)
r=i t*=i
n T*+i 4
+L L L ~xjrt* (Xr,t-t*+ I - Xr,t-t*) + U jt
r=i t*=i x=i
where u jt is a disturbance tenn and all other variables have been defined before.
We show in Table 11.3 the predictor variables with statistically significant effects
for each brand's market share equation. In this study 13 own-brand effects are ob-
tained, that is, in 13 cases when j = r, the marketing instrument of brand j has a
statistically significant effect with the expected sign on the brand's own market share.
Because each brand does not use all marketing instruments, the maximum possible
number of own-brand effects varies between the brands (the maxima are shown in the
last row). Across the brands the maximum number of own-brand effects is 28.
There are 18 cross-brand effects significant, with the expected sign, indicated
as off-diagonal entries in Table 11.3. The maximum number of cross-brand effects
equals 168. Thus, the proportion of significant cross-brand effects (18/168 or 11
percent) is much lower than the proportion of significant own-brand effects ( 13/28 or
46 percent). From Table 11.3 we can draw some conclusions about competition based
MODELING COMPETITION 211
on consumers' response function estimates. For example, brand 3's market share is
affected only by feature advertising for brand 5. On the other hand, brand 7 only
affects brands 4 's market share with sampling.
In this section we describe a framework that can be used to enhance the congruence
between competitor-oriented and customer-focused decision making. The framework
relates consumer response and competitive reaction effects, and it provides a basis for
categorizing over- and underreactions by managers. 12 It is related to the framework
introduced in Sections 9.3 and 11.1.
We consider a market to be "in equilibrium" if the market shares mit for brands
j = 1, ... , n are stable over time. That is, apart from perturbations, m jt = m j,t--.1 ~
mi. Now suppose that brand j changes variable Uhj, i.e. the value of its h-th market-
ing instrument, which affects its own market share:
8m·
_ J ~0. (11.17)
8Uhj
II. See Chen, Smith, Grimm ( 1992), Brodie, Bonfrer, Cutler ( 1996), Leeflang, Wittink ( 1996). See also the
discussions in Kreps, Wilson (1982), Deshpande, Gatignon (1994), Armstrong, Collopy (1996) and Leeflang,
Wittink (1996, 2000).
12. The framework is based on economic arguments. For a discussion of limitations of this frame and of other
frames, see e.g. Johnson, Russo (1994).
212 CHAPTER II
(11.18)
where 8m j 18uhj indicate the direct effects (compare equation (11.6)). The competi-
tive reaction effects 8uer I 8uhj multiplied by the cross-brand market effects 8m j I 8uer
are expected to result in the total effect being smaller than the direct effect shown in
(11.17).
Competitive reaction effects should occur in the presence of nonzero cross-brand
effects. That is, if a change in brand j 's instrument h affects i 's market share,
8m; I= 0 (11.19)
8Uhj
then brand i must react to compensate for the market share loss. The total effect of
brand j's change in marketing instrument h on i's market share is: 13 • 14
(11.20)
Thus, in the presence of a change in instrument h for brand j for which ( 11.20) holds,
the market will return to equilibrium if:
8m =O (11.21)
8Uhj
or
L·
8m; _ _ ' 8m; 8ue;
8uh· -
J
L 8ue· 8uh· ·
f=l I J
(11.22)
(11.23)
13. See also Metwally (1992) who uses a similar line of reasoning.
14. Note that (11.20) defines the total effect of a change in Uhj on the market share of brand i, while (11.18)
defines the total effect of a change in Uhj on brand j's share.
MODELING COMPETITION 213
or
L;
It follows from ( 11.25) that the reaction elasticity is linearly related to the cross-brand
market share elasticity and is inversely related to the own-brand market share elastic-
ity. Relation ( 11.25) is a relation between three kinds of elasticities. We use the partial
derivatives of these elasticities in a framework to distinguish competitive situations.
For simplification, the framework is restricted to the absence/presence of effects, i.e.
the derivatives are either zero or not, which results in eight possible combinatioos,
shown in Figure 11.1.
In cell A of Figure 11.1 all three effects are non-zero. This case is identified as
"intense competition" under which brand i uses marketing instrument l to restore
its market share, which is affected by brand j's use of variable h.
In the presence of a cross-brand market share effect (TJm;,uhf ¥= 0), brand j's loss
of market share will not be recovered if:
• the own-brand market share effect is zero (TJm;,ue; = 0) as in cell B;
• there is no competitive reaction effect (Pue;,uhf = 0) as in cell C;
• there is neither a competitive reaction effect nor an own-brand market share effect
as in cell D.
214 CHAPTER II
B D F H
Overreaction
Spoiled arms for Ineffective No
NO defendd' arms
Spoiled arms for
competition
defendd'
ai = Defender, j = Attacker
b Defender may lack information on own market share effects
In their application, Leefl.ang and Wittink (1996) find that managers in a Dutch non-
food product catagory tend to overreact. In a replication study, Brodie, Bonfrer and
Cutler (1996) find that managers in one product catagory in New Zealand also tend
to overreact.
In the previous sections we have essentially treated the marketing initiatives for indi-
vidual brands as exogenous variables. Only the reactions were treated in Section 11.1
as endogenous variables. Yet in the marketplace managers consider not only what
they believe to be the consumer response but also what they expect competitors will
do when a marketing initiative is considered. These complexities make the choice
of an action in a competitive situation seem intractable, because what is optimal for
one brand depends on what other brands do. And what other brands do depends on
what the first brand does, etc. 15 In the normative models discussed in Sections 8.3
and 8.4 we determine the optimal marketing mix for one brand assuming particular
reaction patterns of competition. This implies that one does not derive a simultaneous
optimum for all brands in the product class. Instead each brand manager treats the
competitors' strategies as given and computes her own best response. Simultane-
ous solutions call for game-theoretic approaches, and some researchers have applied
game theory in a marketing context. Most of the early game-theoretic models 16 are
theoretical and lack empirical applications. Since the early eighties powerful advances
in game theory have taken place, particularly in the area of dynamic games. As a
result, the theory is now more applicable to the modeling of real-world competitive
strategies. 17
In this section we discuss a few game-theoretic models after we introduce some
relevant concepts. 18 We restrict ourselves to game-theoretic models with a mathemat-
ical structure that corresponds to the structure of other models in this book. 19
A distinction is made between cooperative and non-cooperative game theory.
Cooperative game theory examines the behavior of colluding firms through the max-
imization of a weighted average of all firms' profits. If we have two firms with profits
.1r1, .1r2 this means:
maxn = A.n1
Xtj
+ (1- A.)n2 (11.26)
where
). = the weight for firm 1,
Xej= marketing instrument e of firm j, j = 1, 2, e = 1, ... ' L.
---
15. See Moorthy (1985).
16. Examples are Friedman (1958), Mills (1961), Shakun (1966), Baligh, Richartz (1967), Gupta, Krishnan
(1967a, 1967b), Krishnan, Gupta (1967).
17. Moorthy (1993).
18. More complete treatments can be fowtd in Hanssens eta!. (1990, Chapter 8), Erickson (1991), Moorthy
(1985, 1993).
19. Useful books on game theory include Luce, Raitfa (1957), van Damme (1987) and Friedman (1991).
216 CHAPTER II
where
1rj = j(Xtj).
to Cournot (1838) and Bertrand (1883). Cournot argued for quantity (q) as the choice
variable, whereas Bertrand argued for price (p) as the choice variable. In the Cournot
model, competitors conjecture the quantities supplied by other firms, and they assume
that other firms will do whatever necessary to sell the conjectured quantities. The
resulting equilibrium is called a "Cournot-equilibrium". In the Bertrand model price
is the decision variable. The resulting equilibrium is called a "Bertrand-equilibrium".
In single-firm decision-making under certainty, the choice, of either price or quan-
tity as a decision variable, is moot. However if one solves for an equilibrium, it does
matter that each firm's conjecture about the other firm's strategy variable is the cor-
rect one. For example, with two firms, and quantity and prices as decision variables,
four kinds of equilibria can emerge: (price, price), (price, quantity), (quantity, price),
(quantity, quantity). Specifically, the (price, quantity) equilibrium is the equilibrium
that results if firm 1 chooses price while conjecturing that firm 2 sets quantity and
firm 2 sets quantity, while conjecturing that firm I sets price.
The shift from theoretical, static game-theoretic models to empirical, dynamic
models also shifts the attention from normative models to descriptive game theory.
The latter involves the application of game-theoretic models to test whether market-
place data are consistent with one or another model specification. So, for example,
Roy, Hanssens and Raju (1994) examined whether a leader-follower system or a
mutually independent pricing rule (Nash) is most consistent with the pricing behavior
in the mid-size sedan segment of the U.S. automobile market. Their results suggest
that the leader-follower system is most consistent with the data.
As an example we discuss here a study by Gasmi, Laffont and Vuong (1992).
They study the behavior of Coca Cola and Pepsi Cola, with quarterly data for the
United States on quantity sold, price and advertising. They estimated various model
specifications to allow for the possible existence of cooperative as well as non-coop-
erative strategic behavior in this industry. Their work proceeds with the specification
of an objective function for each firm (a profit function) as well as demand and cost
functions. Given these specifications, it is possible to obtain a system of simultaneous
equations based on assumptions about the firm's behavior. Throughout it is assumed
that there is a one-to-one relation between firm j and brand j, and the terms are
used interchangeably. Gasmi et al. (1992) propose the following demand function for
brandj:
where
c j = the constant variable cost per unit of brand j.
The profit function can be written as:
If there is total collusion, as in the fourth game, a specific form of ( 11.26) is maxi-
mized:
max
PI ,p2,a1 ,a2
n = AJri(P!. pz, a1, a2) + (l- A)nz(p!, P2· a1, az). (11.32)
This system of four linear equations uniquely defines the four endogenous variables
PI, P2.a1 and a2. The Hessian matrix of second-order conditions has to be negative
semi-definite which imposes restrictions on the parameters.
Gasmi et al. (1992) include additional exogenous variables and specify functions
for the demand intercepts (Yjo) and for marginal costs (cj) which makes the sys-
tem identifiable. These functions together with the demand functions ( 11.28) can be
estimated as a system of simultaneous equations. Gasmi et al. (1992) in fact derive
a general model specification, and they use it to test the six games. The empirical
results suggest that, for the period covered by the sample (1968-1986), some tacit
collusive behavior in advertising between The Coca Cola Company and Pepsico, Inc.
prevailed in the market for the cola drinks. Collusion on prices did not seem to be as
well supported by the data. Thus the results favor the specification for game 5.
The study by Gasmi et al. (1992) deals with horizontal competition and collu-
sion. Their model can be extended to consider vertical competition/collusion between
competitors/partners in the marketing system as well. 22
The demand equations (11.28) have a structure that is also used in other game-
theoretic models, for example in studies by Kadiyali (1996) and Putsis, Dhar ( 1999).23
Other demand equations that have been used were developed by Vidale and Wolfe
(1957) and Lanchester. 24 More recently Chintagunta and Rao (1996) specify a logit
model at the individual household level:
+ /3j0jt + YiPjt)
exp(aj
j = 1, 2, (1 1.34)
Lr=l exp(ar + f3r0rt + YrPrr)
2 '
where
rr jt the probability of purchase of brand j by an
individual consumer at t,
aj = the intrinsic preference for brand j,
Oj 1 = the component of brand j's preference that evolves
overtime.
p jt = the price of brand j att.
22. Other game-theoretic models that deal with vertical competition/collusion can be found in Moorthy (1988),
Chu, Desai (1995), Lal, Narasimhan (1996), Krishnan, Soni (1997), Gupta, Loulou (1998), Vilcassim, Kadiyali,
Chintagunta ( 1999), Kim, Staelin ( 1999).
23. See also Putsis, Dhar (1998).
24. See Kimball (1957). For examples see Chintagunta, Vilcassim ( 1994), Erickson ( 1997).
220 CHAPTER 11
This model accounts for preference accumulation to infer implications for dynamic
pricing policies by firms in duopoly markets. Chintagunta and Rao make assumptions
about the development of eJt over time. Preferences across consumers are aggregated
to the segment level. Their results indicate that, given equal marginal costs for brands,
in steady-state equilibrium the brand with the higher preference level charges a higher
price.
CHAPTER 12
'A stochastic model is a model in which the probability components are built in
at the outset rather than being added EX post facto to accommodate discrepancies
between predicted and actual results'
(Massy, Montgomery and Morrison, 1970, p. 4).
Thus, stochastic models take the probabilistic nature of consumers' behavior into
account. By contrast, econometric models include an error component that may cap-
ture uncertainty due to model misspecification (omitted variables, functional form) or
measurement error.
Much of the initial research on stochastic models focused on the specification of
the distributional form. More recently attention has shifted toward including market-
ing decision variables in the models, and integrating the components of consumer
behavior into a single framework. Stochastic models of consumer behavior are of-
ten classified according to the type of behavior they attempt to describe. The major
categories are:
Purchase incidence, purchase timing and brand choice are described by stochastic
processes, i.e., families of random variables (X1 ) indexed by t varying in an index set
T(t E T) (Parzen, 1962, p. 22). A random or stochastic variable is a variable which
can assume different values (or alternative value intervals) with some probability
other than one. The values the random variable can take are defined in the state space
which is the collection of possible outcomes of the stochastic process under study.
For instance, if the process is the purchasing of a specific brand, the state space for
one trial is Y(es), N(o), and for two trials it is (YY, YN, NY, NN). An event is defined
as a single outcome of the stochastic process. For example, Y respectively YY are
events of the process.
221
222 CHAPTER 12
12.1.1 INTRODUCTION
Following Massy et al. (1970) purchase incidence models are models that specify
how many purchases will occur in a specified interval of time. In purchase incidence
models the probabilities that a purchase of a brand or product will occur in the time
interval (t, t +h) are considered: P(Pu E (t, t +h)), where Puis the purchase. Pur-
chase incidence models are used to calculate other important quantities characterizing
consumer behavior (Massy et al., 1970, Ehrenberg, 1988):
• the probability that the waiting time between one purchase and the next will be
less or equal to some specified value;
• the probability that the number of purchases in a certain time interval (longer than
h) will be equal to a given value;
• the penetration: the proportion of individuals who have bought the product/category
at least once in a certain time interval;
• lost buyers, the proportion of individuals who purchased in period t, but not in
period t + 1;
• new buyers, the proportion of individuals who did not purchase in period t, but
did in period t + 1.
Models that deal with the first-mentioned probabilities are purchase timing models,
which are intrinsically connected to purchase incidence of a specific form. A distribu-
tion for the interpurchase times defines the distribution of products bought in a given
time period, and vice versa. Purchase incidence models describe the total number of
units bought of a particular brand (brand sales) or product category (industry sales).
Often they contain less behavioral detail than, for example, brand choice models.
Purchase incidence models were developed by Ehrenberg (1959, 1972, 1988) and
Chatfield, Ehrenberg and Goodhardt (1966). These models are based on the Poisson
process, which has the property that the distribution of the number of units purchased
in any interval depends only on the length of that interval. The random variable ( Yi 1),
denoting the number of units purchased by consumer i in a certain time period t, then
follows a Poisson distribution with parameter A.:
e-J..t(A.t)Yit
P(Yit = Yit) = f(Yit I A., t) = 1 , (12.1)
Yit·
i = 1, ... , I, t = 1, ... , T.
The Poisson process has expectation: E[Yitl =AI, which shows that A. can be inter-
preted as the rate of the process. Its variance is equal to the mean. The probability of
STOCHASTIC CONSUMER BEHAVIOR MODELS 223
at least one purchase in the interval t, the penetration, which is of primary interest in
purchase incidence models (Ehrenberg, 1988), is:
P(Y;1 > 0) = 1 - e-'J.t. (12.2)
An estimator for >.. in the Poisson process 1 is simply the mean of the observed pur-
• I A
The assumptions underlying the Poisson process are quite restrictive in many mar-
keting applications. For example, the assumption that all consumers have the same
value of>.. is unrealistic. Heterogeneity has been accommodated in several ways, most
frequently by assuming that >.. is a random variable that follows a gamma distribution
across individuals:
(12.4)
or
with and {J being parameters of the gamma distribution and r (·) the gamma func-
tion. The gamma distribution is a very flexible distribution that can take on a va-
riety of shapes. We note that if fJ = 1, 2, 3, ... , takes on integer values, then an
Erlang distribution arises. From (12.1) and (12.4) the number of purchases for a (ran-
r
domly selected) individual can be shown to follow a Negative Binomial Distribution
(Ehrenberg, 1959, Morrison and Schmittlein, 1988, East, Hammond, 1996):
The NBD has expectation: E[Yit] = {Jtjor, and its variance is: Var[Yit] = {Jtjor +
fJ t 2 j or
2 . Thus, the variance of the NBD exceeds that of the Poisson distribution, which
equals the mean {Jtjor. The probability of at least one purchase in the interval t, the
penetration, is:
or
Estimators for and fJ can be derived for example from the estimated mean ji and
variance a 2 of the NBD: a= (jit)/(& 2 - ji) and~= (aji)jt. Morrison and Schmit-
tlein ( 1988) derive conditions under which the NBD at the brand level leads to a NBD
I. An interesting procedure for estimating purchase frequency distributions based on incomplete data was
provided by Golany, Kress and Philips ( 1986).
224 CHAPTER 12
at the product class level. In empirical applications, the NBD seems to fit either well
at both levels or at neither level.
Sichel (1982) has proposed the family of generalized inverse Gaussian distribu-
tions to describe heterogeneity in the purchase rate. A different category of models
arises when the heterogeneity distribution is assumed to be discrete: the so-called
finite mixture models. The finite mixture ofPoissons has been applied to describe pur-
chase incidence by a.o. Wedel, DeSarbo, Bult and Ramaswamy ( 1993 ). We describe
that class of models in more detail in Section 17 .2.
Apart from the pitfall that heterogeneity is not accounted for, another criticism
of the Poisson process is that the exponential distribution of interpurchase times
leads to highly irregular purchase behavior. Chatfield and Goodhart (1973) proposed
to use the Erlang distribution, which leads to the Condensed NBD for purchase
incidence. It describes a more regular pattern of purchase behavior. Such regular
patterns of purchases have been confirmed empirically (Morrison, Schmittlein, 1988).
Wheat and Morrison ( 1990) show how regularity can be estimated if we observe two
interpurchase times. 2
Yet another critique of the Poisson and NBD purchase incidence models is that they
do not accommodate individuals who never buy. Both models predict that every
individual will eventually buy the product, as t increases. Since for most products
and categories there is a group of individuals who never buy, 3 these purchase inci-
dence models tend to underestimate the percentage of zero purchases. One solution
(Morrison and Schmittlein, 1988) is to add a component to the model that allows for
an additional spike at zero, due to the class of non-buyers, with proportion n 0 . The
Zero-Inflated Poisson (ZIP) model is:
e-l.t (A.t)Yit
P(Y;r = y;,) =no+ (1- no)--~ ( 12. 7)
y;,!
The ZIP model has a mean: E[Y;r] =no+ (1 - no)A.t, and a variance: Var[Yit] =
A.t(l - no)(l - noA.t). The penetration is:
(12.8)
Estimates of no and A. can be obtained from the equations for the mean y = (l-iro)5-t,
and the proportion of zeros: Po = fro + (1 - iro)e->- 1• These equations need to be
solved by iteration.
A straightforward extension accounts for both added zeros and heterogeneity, giv-
ing rise to the Zero-Inflated Negative Binomial Distribution (ZINBD). See Schmitt-
lein, Cooper and Morrison (1993) for an application.
2. Other distributions that have been considered for interpurchase times are the inverse Gaussian, the log-
normal and the Weibull distributions (Sil(kel and Hoogendoom, 1995).
3. The Poisson distribution is not defined for those subjects.
STOCHASTIC CONSUMER BEHAVIOR MODELS 225
An extensive comparison of the Poisson, NBD, ZIP and ZINBD models, amongst
others, to estimate the penetration of products was completed by Sikkel and Hoogen-
doom (1995). They used panel data for 275 different food products, for 750 house-
holds. They found that ZIP and ZINBD have (small) biases in predicting penetration,
although ZIP performed very well for small time intervals. They also encountered
some problems with the above-mentioned estimators of the parameters, which did
not always converge or were outside the possible domains.
Much research in purchase incidence modeling has focused on finding the appropriate
distribution for purchase frequencies, and estimating important quantities from the
data, including penetration, lost buyer percentages and so on. More recently, success-
ful attempts have been made to include marketing decision variables as predictors
into these models. The primary extension is that one parameterizes the mean of the
distribution as a function of predictors. In such response types of purchase incidence
models, 4 the effects of decision variables are measured by changes in the shapes
and/or the parameters of the probability distributions. 5 Early attempts are Magee
(1953) and Massy (1968). Magee's work, for example, is based on a "Poisson-type
purchase model". The effects of promotional effort on sales are traced in this model,
comparing the sales probability distributions of consumers exposed to promotional
activity with the sales probability distributions of consumers who did not receive
promotional attention.
We consider the simplest case, a Poisson distribution of purchase incidence of a
particular product. Assume there are marketing decision variables xe, e = 1, ... , L,
including for example the product's price, frequency of promotions, etc., but also
consumer characteristics z;e', £' = I, ... , L'. The idea is that the expected number of
purchases by individual i in the period under consideration (/.L;), is related to those
explanatory variables:
Model (12.9) accounts for variation in the purchase rate across the sample as a func-
tion of the explanatory variables. The mean of the purchase incidences is f.Li. The
exponent in (12.9) guarantees that the predicted purchase rate is positive. With this
formulation we can assess the effects of marketing variables on purchase incidence.
The inclusion of consumer characteristics, such as demographic and socio-economic
variables, is an alternative way to account for heterogeneity in the purchase rate across
the sample.
Estimation of (12.9) in conjunction with (12.1) cannot be accomplished by solv-
ing closed-form expressions, but a likelihood funtion needs to be maximized nu-
4. For the terminology adopted here. see Leeftang (1974).
5. The developments here are based on the theory of generalized linear models: McCullagh and Neider ( 1989).
226 CHAPTER 12
Purchase timing models are intrinsically related to purchase incidence models in that
the choice of a distribution for interpurchase times at the same time defines the distri-
bution of purchase incidence. Some authors have preferred to work with purchase
times rather than purchase incidence. A variety of statistical distributions can be
assumed for the interpurchase times, such as the exponential, Erlang-2, gamma, Gom-
pertz, Inverse Gaussian, log-normal and Weibull distributions (Sikkel and Hoogen-
doom, 1995, Frenk, Zhang, 1997). The exponential and Erlang-2 distributions have
been used most frequently (Chatfield, Goodhardt, 1973, Schmittlein, Morrison, Co-
lombo, 1987, Morrison, Schmittlein, 1988). The models based on these distributions
are parsimonious and usually fit the data well.
More recently, market researchers have used hazard models to describe interpurchase
times (Gupta 1991, Jain and Vilcassim, 1991, Goniil and Srinivasan, 1993b, Helsen
and Schmittlein, 1993, Wedel, Kamakura, DeSarbo, Ter Hofstede, 1995). The major
advantage of this approach is that it accounts for so-called right-censoring. Right
censoring occurs if a sample of consumers or households is observed for a time period
of fixed length only, causing longer interpurchase times to have a larger probability
of falling (partially) outside the observation period. If one does not account for right-
censoring, we obtain biased estimates. The standard distributions that have been used
for interpurchase times, such as the exponential and Erlang distributions, can also be
formulated in a hazard framework.
In hazard models the probability of a purchase during a certain time interval, say
t to t + !!.t, given that is has not occurred before t, is formulated as:
where T is the random interpurchase time variable. 7 Parametric methods for inter-
purchase times involve assumptions about their distribution. Two distinct classes of
6. See Section 16.6.
7. An extension of the formulation to multiple purchases is straightforward.
STOCHASTIC CONSUMER BEHAVIOR MODELS 227
where y is the number of events that has occurred during the interval [t, t + M].
An example of the expression for such a discrete time probability is provided by
the Poisson distribution in (12.1). This illustrates the relationship between the hazard
model and a purchase incidence model. The discrete-time formulation has the advan-
tage that it leads to simple model formulations, which enables one to accommodate
censoring and multiple events in a straightforward manner. Binomial models have
also been used to model discrete durations in ( 12.11 ), but the Poisson distribution
has the advantage of allowing for more than one event to occur in the discrete time
interval.
In the continuous-time approach, L':;.t approaches zero in ( 12.1 0), to yield a con-
tinuous hazard rate A.(t):
. P[t ::::: T ::::: t + M I T :::: t]
A.(t) = hm . (12.12)
M{-0 L':;.t
The hazard rate can be interpreted as the instantaneous rate of purchasing at time t.
The distribution function of interpurchase times can then be derived:
f(t) = lim
b.t.j,O
- T -< t
P[t <
L':;.t
+ M] = )..(t) exp [ - 1
0
1
J
A.(s)ds . (12.13)
Alternatively, (12.13) can be written as f(t) = A.(t)S(t). Here, S(t) = P(T > t) is
the probability that a purchase has not yet occurred at t where S(t) is the so-called
survivor function. The advantage of this formulation becomes apparent in case of
censoring, because f(t) represents the density of any uncensored observation or
completely observed interpurchase time. If, due to censoring, an interpurchase time is
not completely observed, S(t) provides the probability that the purchase has not yet
occurred, since we only know that the interpurchase time is larger than t, the end of
the observation period. The continuous-time approach appears to be the most com-
monly used approach in the marketing literature. 8 If the interpurchase times follow
an exponential distribution (and the purchase incidence is Poisson), then the hazard
rate and survivor functions are:
and
Thus, for the exponential distribution the hazard of a purchase is constant and in-
dependent of time. Also, purchases occur at random time periods, independent of
past purchases. For other distributions the hazard rate and survivor functions can be
formulated as well.
Flinn and Heckman (1983) proposed a very flexible (Box-Cox) formulation 9 for
hazard functions. This is applied to brand switching problems by Vilcassim and Jain
( 1991 ). It includes many of the frequently used distribution functions as special cases,
and the hazard rate is formulated as:
{12.17)
Note that the hazard rate is constant within each period, but varies from period tope-
riod, thereby accounting for nonstationarity. This approach is quite flexible, because
across time periods the hazard may take an arbitrary form. Since the hazard between
tr and tr+ 1 is constant for all r, the model assumes random purchase timing within
each period.
Several hazard models have been compared empirically. Bayus and Mehta ( 1995)
estimated gamma and Weibull distributions from replacement data on consumer dur-
ables, including color TV's, refrigerators, washing machines, vacuum cleaners and
coffee makers. Many applications have dealt with household scanner panel data.
Gupta ( 1991) applied exponential and Erlang-2 distributions to scanner data on ground
coffee. Jain and Vilcassim (1991) estimated the exponential, Weibull, Gompertz, and
9. Box, Cox (1964). This specification is discussed in Sections 16.1.4 and 18.4.3.
10. For other special cases see Jain, Vilcassim (1991).
STOCHASTIC CONSUMER BEHAVIOR MODELS 229
Erlang-2 distributions with scanner data on coffee, Vilcassim and Jain ( 1991) used the
Weibull and Gompertz distributions on saltine crackers, Wedel et al. (1995) applied
the piecewise exponential model to scanner data on ketchup, and Helsen and Schmitt-
lein ( 1993) fitted a Weibull and a quadratic baseline hazard to saltine crackers scanner
data. We note that Vilcassim and Jain (1991) and Wedel et al. {1995) explicitly deal
with the timing of brand switching and repeat buying.
It is difficult to state general conclusions, except that the simple exponential
model does not seem to describe interpurchase timing processes well. Each of the
models used in the studies referred above, however, is more complex than the ones
described here in that they accommodate heterogeneity of the hazard rate across
subjects, and/or include marketing decision variables. We discuss these two topics
in the next two sections.
12.2.2 HETEROGENEITY
The hazard modeling approach discussed here, assumes that consumers are homo-
geneous. Gupta (1991) handled heterogeneity in a way that is comparable to the
approach described before in purchase incidence models. He assumed a parametric
distribution of interpurchase times, specifically exponential or Erlang-2, and let het-
erogeneity be captured by a gamma distribution for the scale parameter f3. The ad-
vantages of this approach are that it provides a natural extension of the NBD models
discussed in Section 12.1.3, and that f3 can be directly interpreted as a measure ofhet-
erogeneity. The approach provides simple closed-form expressions that lead to ana-
lytically tractable estimation equations. Gupta ( 1991) applied the exponential-gamma
and Erlang-gamma models to scanner panel data on coffee. The exponential distribu-
tion yielded 5.. = 0.38 with a highly significant t-value, indicating a rate of purchase of
slightly more than one pack in three weeks. The estimates for the exponential-gamma
model were: & = 13.26, ~ = 5.00 (both strongly significant), with an expectation of
~/& = 0.38. This result is indicative of heterogeneity in the sample with respect to
the (exponential) hazard rate, but a modest amount as evidenced by the high value of
rx.
Various other approaches to accommodating heterogeneity were provided by Jain
and Vilcassim ( 1991 ), Bayus and Mehta ( 1995) and Wedel et al. ( 1995). These authors
used discrete mixing heterogeneity distributions, whereby a number of unobserved
classes or segments are estimated along with the parameters of the hazard model.
A discussion of those approaches is beyond the scope of this chapter. We refer to
Section 17.2 for a general treatment of the mixture model approach. Goniil and
Srinivasan {1993b) provide a comparison of continuous and discrete approaches to
heterogeneity in hazard models for interpurchase times. Some heterogeneity can also
be accommodated through the inclusion of consumer characteristics.
230 CHAPTER 12
The inclusion of marketing decision variables in hazard models has received some
attention. The basic idea for models that allows marketing decisions to influence
purchase timing, is to reformulate the hazard function. An important case is presented
by so-called proportional hazard models (Cox, 1975), for which:
Segment I Segment2
switchers for whom price is slightly more effective in inducing repeat buying {than
switching).
Sinha and Chandrashekaran ( 1992) model the diffusion of an innovation, or rather
the timing of the adoption of an innovation, using a hazard model. They explicitly
account for the fact that a proportion of the subjects will never adopt the product.
If the proportion of adopters is denoted by rro, then the probability of observing a
certain adoption time, given that the adoption occurs in the time period under study
(y = 1) is:
f(t I y = 1) = rro:A(t)S(t) (12.19)
which is the probability of an adoption times the rate of purchase at time t multiplied
by the probability of no purchase at time t. The probability of not adopting during the
sample period (y = 0) is:
f(t Iy = 0) = (1 - rro) + rroS(t) (12.20)
which is the probability of never adopting plus the probability of eventual adoption
after the sample period (i.e. censored in the particular sample). If the distribution of
the adoption time is assumed to be exponential, so that the hazard rate and survivor
function are provided by (12.14) and (12.15) respectively, this model is equivalent to
the ZIP model for purchase incidence described in 12.1.4. Next to the exponential,
Sinha and Chandrashekaran consider the Weibull and log-normal distributions for the
adoption time.
We use these distinctions for the classification of brand choice models below.
Markov processes can be classified according to the nature of the index set T (denot-
ingtime). When T = {0, 1, 2, ... , } the stochastic process is said to be a discrete time
process. When T = {t : -oo < t < oo} or T = {t : t :::: 0} the stochastic process is
said to be a continuous time process. In addition to processes that evolve in discrete
time, in modeling brand choice we consider processes with a discrete state space.
Such discrete states will be denoted by i, j, k, ....
Discrete time Markov brand choice models are based on Markov chains, in which
one considers probabilities such as:
i.e. the probability that brand j is purchased at time t, given that brand i was pur-
chased at t-1, brand kat t-2, .. .. These probabilities are called transition proba-
bilities. A simple specification of the transition probability in (12.21) is obtained by
assuming that the conditional probability at t depends only on the purchase at t -1.
This was first studied by the Russian mathematician Markov, hence the name of the
process. The Markov assumption implies that:
ZERO-ORDER MODELS
depend on past purchase history (Massy, Montgomery and Morrison, 1970, Chapter
3). Thus (12.22) reduces to:
If the random variable X takes one of only two values (representing two brands, or a
purchase and non-purchase situation) X = x = 1, with probability rr, and X = x =
0, with probability 1 - rr we obtain the so-called Bemouilli model: 12
(l2.25a)
The expectation of the Bemouilli distribution is E[X]=rr, and its variance is V ar[X]=
rr ( 1 - rr). This is a distribution for an individual consumer. Let T be the number of
purchase occasions for each consumer in the population. All have the same probabil-
ity of purchasing a brand and 1 - rr is the non-purchase probability. Then rr · T = TJ
is the expected number of purchases by a consumer. The probability that the number
of times a purchase takes place, x is represented by a Binomial distribution:
The multinomial model arises as the sum ofT Bemouilli distributions (12.25a). The
expectation of the multinomial random variable is: E[Xj] = rrjT, its variance is:
Var[Xj] = 7rj(1- 7rj)T, and the covariance is Cov[Xj. Xk] = -rrjrrkT. The bino-
mial distribution results if n = 2. Thus, in the Bemouilli, binomial and multinomial
models the market is described by a set of (stationary) probabilities, describing the
purchase probabilities (or alternatively the market shares) of the brands.
The above models assume consumer homogeneity, i.e. all subjects have the same
probability of purchasing a particular brand. Heterogeneity has been taken into ac-
count in the Binomial model by allowing rr to follow a beta distribution across the
population of consumers (Massy, Montgomery and Morrison, 1970, Chapter 3). The
beta distribution is a flexible distribution that can take a variety of shapes:
r(aJ +a2)rrat-l(l-rr)a2-l
f(rr I O:J, a2) = r(a1)r(az) (12.27)
12. See Massy, Montgomery, Morrison (1970, Chapter 3), Wierenga (1974,p. 18).
234 CHAPTER 12
with a1 and az as parameters. From (12.25b) and (12.27) the number of purchases
(x) can be shown to follow a Beta-Binomial (BB) distribution:
f (xI a1,a2) =(
T) r(ai+ az)r(x + a1)r(T- x + az) (12.28)
x r(T + a1 + az)r(aJ)r(az)
nnJ=i . ;r;j-
r (I:}= I a1) TI}=i 1
j(JTJ, ... ,JTnlaJ, ... ,an)= (12.29)
r(a 1)
By compounding the multinomial and the Dirichlet one obtains the DirichletMulti-
nomial (DM):
(12.30)
with the mean of Xj: E[Xj] = ajT/ (I:~= 1 ar)· The Beta-Binomial in equation
(12.28) arises as a special case for a two-brand market (n = 2).
Strong empirical support for the DM is provided by Ehrenberg and coauthors
in their work over the past 30 years (see for example Ehrenberg, 1988, Uncles,
Ehrenberg and Hammond, 1995). They discuss many applications where the DM is
a useful description of brand purchase behavior. Regularities based on the DM have
been found in product markets for food and drink products, personal care products,
gasoline, aviation fuel, and motor cars, OTC medicines, as well as in TV program and
channel choice and shopping behavior (Uncles, Ehrenberg and Hammond, 1995). In
many markets, the DM model does a good job of explaining observed regularities in
purchase behavior, such as the percentage of consumers buying in a certain period,
the number of purchases per buyer, the repeat purchases, the percentage of loyals,
etc. In Table 12.2 we show some of the findings in Uncles, Ehrenberg and Ham-
mond (1995). The results apply to the instant coffee item belonging to the Folgers
brand. Table 12.2 shows that the Dirichlit-Multinomial (DM) model does quite well
in predicting market characteristics in stationary markets (i.e. the second row contains
predicted percentages that are close to the actual ones). The authors conclude that this
constitutes one of the best-known empirical generalizations in marketing. However,
criticisms have been raised, pertaining to the absence of marketing variables and the
stationarity assumption.
Finally, we mention an alternative approach for the accommodation of hetero-
geneity in which a discrete instead of a continuous mixing distribution is used. Such
STOCHASTIC CONSUMER BEHAVIOR MODELS 235
Table 12.2 Performance ofthe DM model: Observed (0) and Predicted (P) values for
Folgers instant coffee.
0 18 53 49 13 48 35 40 39
p 17 47 47 15 41 37 37 36
a%Wk denotes the percentage ofpeop1e who bought in a week; %Yr in one year, and
% once the percentage who bought only once.
bMH =Maxwell House, TC =Tasters Choice, HP = High Point, SA= Sanka.
models are the finite mixture models which offer the advantage that segmentation
strategies can be developed. We discuss these in Section 17.2.
A first-order Markov model applies if only the last purchase has an influence on the
present one, i.e.,
P(Xt = j I Xt-1 = i, Xt-2 = k, ..•) = P(Xt = j I Xt-1 = i) = Pijt· (12.31)
Because the Pijt are (conditional) probabilities, they must have the following proper-
ties:
0 ~ Pijt ~ 1, for all i, j = 1, ... , n, t = 1, ... , T (12.32)
n
L Pijt = I, for all i = 1, ... , n, t = I, ... , T. (12.33)
j=l
where n is the number of brands. We assume consumer homogeneity, i.e. consumers
have the same Pijt (compare Section 9.I). If one makes the additional, simplifying
assumption that Pijt = Pij, for all i and j, i.e. the transition probabilities are inde-
pendent of time, the resulting Markov chain is said to be stationary. The transition
probabilities Pij can be represented in a matrix. This transition probability matrix TP
is represented as: 13
PI! Pl2 Pin
P21 P22 P2n
TP= (12.34)
0 0.5 0.5
1 0.55 0.45
2 0.575 0.425
3 0.5975 0.4025
00 0.600 0.400
The diagonal elements (pu, P22, ... , Pnn) are the repeat purchase probabilities. The
off-diagonal elements are the brand-switching probabilities.
If the market shares in period tare in the (row) vector m 1 = [m, 1 , m2 1 , ••• , mn 1 ]
one can use the transition probability matrix, TP, to predict market shares in future
periods through the relation between the market shares at t + 1,
mr+I = [mi,t+l· ... , mn,t+Il and market shares in period t, mr, written in matrix
formulation as:
(12.35)
We illustrate this with an example of two brands. Consider the following transition
probability matrix,
TP = [ 0.8 0.2 ]
0.3 0.7
and assume that the current (period 0) market shares are m 10 = 0.50, m2o = 0.50. In
period t = 1, the predicted market shares are computed as the matrix product m1 TP:
and
(12.36)
where TP 2 = TP · TP, and TP 3 = TP · TP · TP, and so on. Using this relation, one
obtains the market shares shown in Table 12.3. In equilibrium (steady state), the
STOCHASTIC CONSUMER BEHAVIOR MODELS 237
Brand choice in t + I
Brand choice in t 2 3 4
0.81 0.01 0.18 0
2 0.14 0.75 0 0.11
3 0 0.25 0.74 0.01
4 0.20 0.12 0 0.68
predicted market shares are respectively 0.60 and 0.40. These steady state market
shares collected in the vector m, (or shares) are independent of time, and satisfy:
The stationary market shares in Table 12.3 can be directly computed from equation
(12.38) as: m1 = 0.3/0.5, and mz = 0.2/0.5. It can be shown that under suitable
conditions the Markov chain reaches an equilibrium situation, in which mo ~ m as
t ~ oo, regardless of the initial market shares mo.
Horowitz ( 1970) estimated transition probabilities for a market consisting of four
U.S. premium brewers. Twenty years of yearly market share data were available.
The results are shown in Table 12.4. 14 The estimation of transition probabilities from
market share data can provide useful descriptive information. For example, in Table
12.4 we see that brand I enjoys the highest repeat purchase probability. Brand 4 has
the lowest market share (about 11 percent), and the results in Table 12.4 suggest that
this is due to its relative inability to keep its current customers, P44 = 0.68, but also
due to its failure to attract many new customers.
It may be instructive to examine how market shares change over time if the
transition probabilities are stationary. Current market shares are: m 1 = 0.350, m 2 =
0.283, m3 = 0.255, m4 = 0.113. With the transition probabilities of Table 12.4, the
steady state market shares are: m1 = 0.340, mz = 0.306, m3 = 0.241, m4 = 0.113.
Thus, brand 2 would gain more than two share points at the cost of brands 1 and 3.
Nevertheless, this market is relatively stable.
We emphasize that several assumptions underlie these results:
c. consumers all have the same conditional and unconditional brand choice proba-
bilities.
Assumption a, that the transition probabilities are stationary, may be unrealistic, for
example if a firm takes corrective action for a brand that is losing market share.
The literature contains several models that allow the transition probabilities to vary,
such as Maffei (1960), Harary and Lipstein (1962), Howard (1963) Styan and Smith
(1964), Ehrenberg (1965), Montgomery (1969) and Leeflang (1974).
Assumption b is related to the fact that time is divided into discrete intervals. This
assumption is less restrictive the longer the time unit. However assumption a becomes
more problematic as the time unit becomes longer. One way to alleviate assumption
b is to assume a two-brand market (j = 1, 2) where j = 1 denotes a particular brand
and j = 2 all other brands, including a no-purchase option. Another suggestion is
to add a no-purchase option to the n existiag brands (Harary, Lipstein, 1962, Telser,
1962a).
The fist two assumptions can also be addressed by a so-called "semi-Markov
model". In such a model, the time between two purchases i and j is considered
to be a random variable fij, with corresponding density function ftj(t), rather than
a constant. The sequence of transitions in these models is Markovian, but it is not
Markovian from one time period to another. For that reason it is called a semi-Markov
process. This process is described not only by a transition probability matrix TP,
but also by a "holding-time matrix" with elements ftj (t) (see Howard, 1963). Some
early applications are given by Hemiter (1971) and Wierenga (1974, pp. 215-216).
More recently, such semi-Markov models have been formulated as hazard models of
brand switching (Vilcassim and Jain, 1991, Wedel et al., 1995). Here the hazard of a
switch from brand i to brand j is modeled as: /ij (t) = A.ij Stj (t) (compare equation
(12.18)), where Sij(t) denotes the survivor function for a switch from ito j, i.e. the
probability that t is larger than the censoring time. The hazard can be parameterized
in various ways as shown in Section 12.2. A detailed discussion of these semi-Markov
and hazard models for brand switching is beyond the scope of this book.
With respect to assumption c, we note that in many situations the conditional
choice probabilities differ across individuals, i.e. the population is heterogeneous. We
can accommodate heterogeneity by making assumptions about the distribution of the
Pij over the population of consumers. Morrison (1966) developed a heterogeneous
aggregate Markov model where a particular transition probability was assumed to be
beta distributed. More generally a vector of transition probabilities can be assumed
to follow a multivariate beta distribution, see Jones (1973) and Leeflang (1974, p.
183). These developments are quite similar to those for the DM model in equations
(12.29) and (12.30). However, this approach has received relatively little attention in
the literature.
Yet another way to accommodate heterogeneity is through a discrete mixture
distribution, giving rise to a finite mixture of Markov models. Such a model has
been developed by Poulsen (1990). He modeled the purchases by new triers of a
certain brand, and subsequent purchases of that and other brands by those triers in
STOCHASTIC CONSUMER BEHAVIOR MODELS 239
five subsequent waves. We give a brief description of that model here. It is assumed
=
that there are s 1, ... , S unobserved segments, where each individual has proba-
bility Os to belong to segments (for all s). Given a segments, a sequence ofbuy/not
buy decisions is described by a first-order Markov process. There are two possible
outcomes, to buy (l) and not to buy (0) the product, at two consecutive occasions
indexed by i and j. The probability of buying on the two occasions is denoted by
Pijls• conditional on segments. If we know to which segment an individual belongs,
that probability can be written according to a first-order Markov model, as:
(12.39)
We show in Table 12.5 results for the two segments Poulsen recovered. Segment 1
(19% of the sample) has a high initial probability of buying the brand (0.68) and
a probability of 0.50 of switching into the brand at the next purchase. Segment 2
(81% of the sample) has a much lower probability of buying the brand (0.13) and a
very low probability (0.06) of switching into it, if no purchase was made. Also, the
probability of a repeat is much lower in segment 2 (0.21) than in segment I (0.71).
From Poulsen's analyses it appears that the mixture of first-order Markov models
provides a much better description of purchase behavior than the homogeneous first-
order Markov model.
The results in Table 12.5 suggest that the Markovian assumptions (specifically
point c above) may not hold in models of purchase behavior. 15 Researchers who
applied the Markov model may not have been overly concerned with the restrictive
nature of the underlying assumptions. They have used the model as a tool that captures
some of the dynamics of markets. Generally speaking, the Markov model has often
been applied because of its performance as a predictor of dynamic aggregate market
behavior, not because it describes how individual consumers behave.
In learning models the entire purchase history is taken into account in each subse-
quent purchase. Well-known applications in marketing include those by Kuehn ( 1961,
15. See also Ehrenberg's (1965) pessimistic view about the applicability of Markov chains.
240 CHAPTER 12
Table 12.5 Mixture offirst-order Markov model estimates from purchase data.
Segment s= 1 s=2
1962). Learning may have a positive effect in the sense that increased familiarity with
the product increases the chance that it will be re-purchased in the future. The effect
oflearning may also be negative, for example due to satiation.
Applications of the linear learning model are provided by Carman ( 1966), Mc-
Connell (1968), Wierenga (1974, 1978), Lilien (1974a, 1974b) and Lawrence (1975).
Wierenga found that the linear learning model produced better descriptions of brand
choice behavior than homogeneous and heterogeneous zero-order and first-order Mar-
kov models did.
We do not provide details about linear learning models because these have re-
ceived limited attention in recent years. These models have the following restrictions.
First, they are only suited for two-brand markets, and are not easily extended. Sec-
ond, only positive purchase feedback is accommodated. Third, there are technical
problems associated with model estimation on micro- and macro-level data. These
problems may have limited the applicability of the model, leading to a reduction in
its popularity (Leeflang and Boonstra, 1982).
Choice modeling has grown to be a very substantial area in marketing research over
the past few decades. 16 In this section we show how marketing instruments can be
linked to the parameters in stochastic choice models. We start by adding variables to
the zero-order binomial and multinomial models. Specifically, we discuss the most
common of the zero-order models, the logit model, and then describe two important
generalizations, the nested logit and the probit model.
16. We do not provide a complete overview of the literatore in this quickly growing area. Instead, we describe
the main principles. See for more information about this topic, for example, the Special Issues on Choice Models
of Marketing Letters, vol. 8, number 3, 1997 and vol. 10, number 3, 1999.
STOCHASTIC CONSUMER BEHAVIOR MODELS 241
In the multinomiallogit model this property arises directly from the independence
assumption of the error terms. It may not be realistic in many marketing applications,
especially if some of the alternatives are close substitutes. On the positive side, if the
IrA-assumption holds, future demand can simply be predicted with the closed-form
expression (12.43) and the estimated values of the parameters (Urban, Hauser, 1980,
Chapter 11 ). However, if similarities across alternatives are incorrectly assumed away,
the estimated effects of marketing variables are incorrect.
McFadden ( 1986) shows how one can deal with problems that arise from the IIA-
assumption, including statistical tests of IIA. If IIA does not hold, other models can
be used, often at the cost of computational complexity. We discuss two such models,
the Nested MultiNomial Logit model (NMNL) and the MultiNomial Probit (MNP)
model.
(12.45)
where V, is the deterministic part of the utility associated with the highest level of
the hiemrchy, etc. Using similar arguments as in (12.43), the choice probability at the
lowest level of the hierarchy (brand choice given product form r) can be derived:
exp(Vjlr)
1rjlr = n · ( 12.46)
Lk=l exp(Vklr)
The choice at the highest level of the hierarchy (r) can be derived from utility maxi-
mization:
:rr, P[max Ujr >max Ujr' for all r', r' "I r] (12.47)
J J
= P[Ur +max Ujlr > Ur' +max Ujlr' for all r' = I, ... , n', r' "I r].
J J
An expression formaxj Ujlr can be obtained from the properties of the double expo-
nential distribution, since the maximum of a set of double exponentially distributed
22. See also Roberts, Lilien ( 1993).
STOCHASTIC CONSUMER BEHAVIOR MODELS 243
variables (with unit variance) also follows a double exponential distribution with
expectation:
E[m~ J
Ujlr] = ln (t V/lr) .
j'=l
e (12.48)
Expression (12.48) is called the "inclusive value" of the utility for the brand name
which is included in the utility for the form as shown in (12.49). From (12.47) and
(12.48) the choice probabilities at the highest level of the hierarchy can be shown to
be:
(12.49)
=
The unconditional choice probability of any alternative, jr, is simply 7rjr 7rjlr ·7rr.
In this model the brand utilities at the lowest (brand name) level of the hierarchy
affect the utilities at the highest (form) level through the inclusive values. For a
comprehensive treatment of the NMNL model see Ben-Akiva and Lerman (1985,
Chapter I 0), who include extensions to higher-order nestings and implications for the
elasticity structure. The NMNL model has been used to model choices in product cat-
egories such as soft drinks (Moore and Lehmann, 1989), coffee (Kannan and Wright,
1991) and peanut butter (Kamakura, Kim and Lee, 1996). The latter authors apply
the NMNL model as well as an extension to mixture formulation which allows for
a partitioning of the market by different segments. For a specification of the NMNL
model at the aggregate level, see Section 14.5.
In the MultiNomial Probit (MNP) model the disturbances of the random utility in
(12.41) are assumed to follow a multivariate normal distribution (sj "'N(O, Q)).
This distribution allows the utilities of alternatives to be correlated, so that the IIA-
assumption can be relaxed. However, a closed-form expression for the probability that
individual i chooses alternative p j cannot be derived, because it involves a multidi-
mensional integral. Probabilities can be obtained by numerical methods if the number
of choice alternatives is limited to 3 or 4. Early applications include those by Currim
(1982) and Kamakura and Srivastava (1984, 1986). A comprehensive treatment of
the MNP model is given by Daganzo (1979). McFadden (1989) developed a simu-
lation method for calculating the probabilities that an individual chooses a particular
alternative. Briefly, it is based on drawing repeated samples from the multivariate
normal distribution for the error terms, and approximating the integral by summations
over the repeated draws. This simulation method has stimulated interest in the MNP
model, leading to applications in marketing by, amongst others, Chintagunta (1992a),
Elrod and Keane (1995) and Haaijer, Wedel, Vriens and Wansbeek (1998).
244 CHAPTERI2
In the Markov response models the transition probabilities, Pijt with i, j = 1, ... , n,
t = 1, ... , T are related to decision variables. Early examples are found in Telser
(1962a, 1962b), Hartung and Fisher (1965), Lee, Judge and Zellner (1970), Leef-
lang and Koerts (1974), and Horsky (1977b). To illustrate, we examine Hartung and
Fisher's work in some detail. In their study the market is treated as a quasi-duopoly,
i.e. n = 2. They assume that the transition probability matrix is non-stationary, and
(12.35) can be written as:
Hartung and Fisher relate transition probabilities to decision variables in the following
way. Brand loyalty for brand 1 is,
(12.51)
where
d1 1 number of outlets where brand 1 is available,
d2r = number of ontlets where brand 2 is available,
Yl = a constant.
d1r
P21r=Y2d +d2t · (12.52)
It
In this case, the values of the transition probabilities are not restricted to the range
zero to one.
Hartung and Fisher obtained the following estimates: Yl = 4.44, Y2 = 0.64. With
these values, it follows that Pllr will become larger than one if the outlet share of
brand 1 exceeds about 22.5 percent. Nevertheless, Hartung and Fisher found that,
for their problem, equations ( 12.51) and ( 12.52) were sufficiently accurate. However,
Naert and Bultez (1975) applied the same model to a brand of gasoline in Italy and
obtained meaningless results. This led them to propose alternative formulations for
the transition probability functions. One of these defines the transition probabilities as
exponential functions of the relative number of outlets rather than as linear functions
of outlet share. This modification produces more acceptable estimates.
A more recent approach to accommodate explanatory variables is provided by
Givon and Horsky (1990). Their formulation is somewhat similar to the Hartung
and Fisher approach, but they include price and lagged advertising effects into the
conditional Markov probabilities. A quite general framework for including marketing
variables into Markov models of brand switching is provided by Zufryden ( 1981,
1982, 1986) and Jones and Landwehr ( 1988). They demonstrate that the multinomial
logit model framework described above can be used to estimate Markov response
STOCHASTIC CONSUMER BEHAVIOR MODELS 245
models with explanatory variables. Their work thus rigorously extends the earlier
work by Hartung and Fisher and others. Zufryden introduces a last-state specification
vector z, for individual i, where z = (z 1, Z2, ... , Zn) is a vector of zeros and ones in-
dicating the state an individual was last in. For example, z1 = ... = Zn-1 = 0, Zn = 1
indicates that brand n was purchased last by individual i. This last-state specification
vector is included among the explanatory variables in a logit specification for 7l"Jir
analogous to (12.43), resulting in:
Thus, this model includes past purchases as a predictor in a logit model with a para-
meter vector Yr indicating the effect of a previous brand purchase on the probability
to choose j, lTjlr· Including Z 1 Yr into the logit model (12.43) results in a specification
of the conditional probability of choosing j, given a previous purchase of r, indicated
by z.
The zero-order multinomial model ( 12.43) is a restricted version of the model ( 12.53).
The first-order hypothesis is maintained if we do not reject the null hypothesis y, = 0
for all r. If the evidence favors (12.53), the implication is that if the values of the
explanatory variables x j change, the first-order Markov transition probabilities also
change. We note that a more general formulation can be obtained if the impact of
the explanatory variables is allowed to depend on the last brand purchased, which
amounts to replacing f3 by f3r in equation (12.53).
Zufryden (1981) estimated his model on data covering one year for a two-brand
market (brand A versus all others 0). The explanatory variables were income and
price (both at two levels, high and low). We show his estimation results in Table 12.6
which indicate that both low income and low price result in a significantly increased
probability of choice. The significance of the coefficient for Last brand indicates that
a zero-order model is strongly rejected for this market. Zufryden obtained first-order
Markov probabilities from this model. These are shown in Table 12.7. Table 12.7
reveals that there are some marked differences in the switching probabilities espe-
cially between the income classes. The lower-income segment has higher loyalty and
246 CHAPTER12
Low price A 0 A 0
A 0.29 0.71 A 0.19 0.81
0 0.08 0.92 0 0.05 0.95
High price A 0 A 0
A 0.25 0.75 A 0.16 0.84
0 0.06 0.94 0 0.04 0.96
switching to brand A, compared with the higher-income segment. For each income
segment, a low price leads to greater loyalty than a high price does. We note that the
model did not include interactive effects between income and price. Thus, the main
effects are: an increase in the repeat purchase probability from high- to low-income
of about 50 percent, and an increase in the same from high- to low-price of almost 20
percent.
In the previous sections we described models for the purchase incidence, purchase
timing and brand choice (or brand switching) separately. Several authors have in-
tegrated such descriptions of separate consumer behavior processes into one single
framework. Some of these models have also included components of market segmen-
tation. The purpose of those models is to provide managers with insights about the
possible sources of gains and losses of sales, about the consumer characteristics and
marketing variables that affect sales, and about the causes of brand loyalty and brand
switching. Many of the approaches compound the (Poisson- or NBD-) distributions
for purchase frequency or distributions of purchase timing (exponential or Erlang)
with (multinomial, Dirichlet-Multinomial, or Markov) models of brand choice. Early
examples of this approach are Jeuland, Bass, and Wright (1980) and Goodhardt,
Ehrenberg and Chatfield (1984). However the importance of this stream of research
was recognized through the seminal work of Gupta ( 1988). The basic setup of these
approaches is as follows:
Table 12.9 The Bockenholt (1993a, 1993b) purchase incidence and brand
choice model family.
Heterogeneity in:
Model Purchase incidence rate Brand selection probabilities
2. Through two types of explanatory variables, those related to the consumers and
those related to brands. The first set of explanatory variables is included in the
expectation of the negative binomial distributions. The second set is included in
the linear predictor of the Dirichlet-Multinomial distributions;
3. Through segments, operationalized via a mixture model (see Section 17.2).
Several models arise as special cases, by assuming homogeneity in the brand se-
lection probabilities and/or in the purchase incidence, by assuming the absence of
segments or the effects of predictor variables. We show in Table 12.9 special cases of
Bockenholt's model (1993a, 1993b).
Dillon and Gupta (1996) applied this framework to a survey ofhouseho1d purchases
of paper towels. The model showed that less than 10% of the households are price
sensitive. Also price sensitivity was higher for households with more children and
for heavy purchasers. Five segments were used to characterize purchase behavior.
Two of the largest switching segments were very price sensitive in both brand choice
and category purchase. Dillon and Gupta ( 1996) showed that the modeling framework
outperforms a variety of competing models, among which models without brand vari-
ables, without loyal, choice or any segments. In addition, the model outperformed the
approaches by Colombo and Morisson ( 1989), Krishnamurthi and Raj ( 1991 ), Grover
and Srinivasan (1992) as well as the Goodhardt, Ehrenberg and Chatfield (1984) type
of models. Earlier Monte Carlo work by Bockenholt ( l993a, 1993b) had also provided
strong support for the validity of this general approach.
CHAPTER 13
Multiproduct models
251
252 CHAPTER 13
13.1 Interdependencies
• demand interdependencies;
• cost interdependencies;
• risk interdependencies.
qjt = exp(/Joj) nn
{3· exp(ujr)
Pr? (13.1)
r=l
where
qjt = the quantity demanded for brand j in period t,
Prt = the price of brand r in period t,
Ujt = a disturbance term for brand j in period t,
n = the number of brands.
In this model, the own-brand price elasticity is captured by /Jrj when j = r. This
effect is expected to be negative, based on economic theory. For j ¥= r, we consider
the following possibilities:
1. fJr j = 0: products j and r are independent in the sense that the cross-price effect
of brand ron the quantity demanded of brand j is zero;
2. /Jrj > 0: products j and r are substitutes in the sense that a decrease in the price
ofbrand r decreases the quantity demanded ofbrand j;
3. fJrj < 0: products j and r are complements in the sense that a decrease in the
price of brand r increases the quantity demanded of brand j.
We note that if fJr j = 0 there is no need to include the price of brand r in the demand
model for brand j. However, in practice we may include such brands to test a null
hypothesis of independence. The structure of (13.1) is similar to that of demand
models, such as the SCAN*PRO model in equation (9.19).
The formalization of actual models that can be used in practice is potentially very
complex. If we assume the availability of weekly, store-level data for all items, a
manufacturer may want to take the effects of tens or hundreds of items into account.
Many of these items have separate marketing activities, and aggregation across items
is justified only for items with identical marketing support over time and between
stores (see Christen, Gupta, Porter, Staelin, Wittink, 1997). It is very easy for an
analyst to encounter a problem with 1,000 or so predictor variables that may be
relevant. There are, however, theoretical and conceptual arguments we can use to
simplify the structure. For example, if a comparative advertising campaign is initiated
for one brand, it is likely that the cross effects involving other brands mentioned will
become stronger. In general, consumers evaluate the substitutability between different
brands by comparing the product characteristics, the packages, the prices, etc. As the
perceived similarity increases, the degree of substitution should increase. Thus, we
can also use our own judgments about product similarities to categorize brands or
items in terms of the expected magnitudes of cross effects, and use this categorization
to place restrictions on the multitude of possible effects. 1
I. See also the discussion about market boundaries in Section 14.3.
254 CHAPTER 13
Walters used the following model to estimate intra- and inter-store effects of tempo-
rary price cuts:
-l
Store
- - -l
Store
~--i
Sales of product j Promotions of product j
Sales of substitutes
of product j
~- Promotions of sub-
stitutes of product j
Store 2
-1 Promotions of product j
Promotions of comple-
r--- ments of product j
Promotions of sub-
~ stitutes of product j
K n
aokj +L L ak'r Pk'rt + Ukjt (13.2)
k'=l r=l
where
To estimate the parameters in (13.2), Walters used store-level scanner data, covering
26 weeks, for four product categories. In one product category, cake mix, three brands
were used as potential substitutes. The complementary product category was defined
as cake frosting. His empirical results show that:
• own-brand price promotions significantly affect sales of the brand in the same
store;
• virtually all of these sales increases emanate from other brands in the same cate-
gory offered in the same store (i.e. intrastore, interbrand substitution effects);
• price promotions for one brand are often effective in stimulating sales of comple-
mentary products within the same store (i.e. intrastore, interbrand complementary
effects);
• price promotions rarely create interstore effects.
We note that these results are consistent with the idea that the majority of product
categories at the store level face inelastic category demand (Hoch, Kim. Montgomery,
Rossi, 1995).
Gijsbrechts and Naert (1984) specify a general form of a marketing resource allo-
cation model. Their model is intended to help management with the allocation of a
given marketing budget B over n product groups. The model accounts for: 4
• the response of market share to marketing activities relevant to the different prod-
ucts;
• the relation between market share, sales and profit;
• a set of strategic constraints imposed by top management.
The general model is represented in equations (13.3)-(13.9) below:
n
= 'L_((pj-Cj)qj-bj-FCj)-FC (13.3)
j=l
subject to
qj = mj Qj (13.4)
Qj = Qj(bl, 01, ... , bn, On. b1e. 01e •... , bne. One. ev) (13.5)
mj mj(bl, 01, •.. , bn, On. b1e. 01e •.•. , bne. One) (13.6)
for j = 1, ... , n
loj < mj :::=: upj, for all j = 1, ... , n (13.7)
n
'L_bj < B (13.8)
j=l
4. We closely follow Gijsbrechts, Naert (1984).
MULTIPRODUCT MODELS 257
and where time subscripts and disturbance terms are omitted for convenience.
Empirically, Gijsbrechts and Naert (1984) use the objective function to cover one
period of one year. Equation (13.3) shows that profit is maximized with respect to
"marketing resources", b1. These marketing resources are aggregates of variables
such as advertising, promotion, personal selling and distribution expenses. The sim-
plifying assumption is that these marketing variables have homogeneous effects and
that aggregation across these different activities does not distort the results.
The lower and upper limits on the market shares of the different groups (equation
( 13. 7)) are imposed by corporate management. These limits may be based on strate-
gic and other considerations that are outside the demand model. For example, the
lower bounds on market share, eo 1, may be imposed for product groups in which top
management wants to invest and maintain a minimum presence in the market with
the expectation of future potential. Conversely, cash flow considerations may force
the placement of upper limits, up1, on market shares. One may view these limits as
reflecting some risk interdependencies between the product groups. Cash flow and
258 CHAPTER 13
Due to demand- and cost interrelationships within a product line, and because there
are usually several price-market targets, the product line pricing problem has been
mentioned as one of the major challenges facing a marketing manager. Monroe and
Della Bitta (1978, p. 417) argue that complementarity between items may show up
through the prices of items in a product line even if the items are inherently substi-
tutable. Thus, through the addition of new items or by changing some prices, a firm
may increase the demand for the product line. Also, products that have functional
complementarity present a challenging pricing problem. Examples are razors and
razor blades, copiers and paper, computers and multiple software models, etc. In
the razor example, consumers have no demand for the blades unless they have the
razor with which the blades can be used. Given also the availability of other razors,
the manufacturer has an interest in distributing the razor widely. If the manufacturer
faces no competition on the blades, it is desirable to offer the razor at a very low
price, and to derive profits from the blades. However, the common introduction of
private-label blades reduces the manufacturer's price for the blades. The problem is
to consider the demand- and cost functions jointly so that the dependencies are used
in the determination of prices.
Other dependencies between items are considered in the bundling literature. In
this literature three options are distinguished:
• pure bundling, a strategy in which only the bundle of products (or services) is
offered;
• pure components, a strategy in which only the components are offered;
• mixed bundling, a strategy in which both the bundle and the components are
offered.
Much of the literature focuses on the optimal strategy and the optimal prices, 6 under
a variety of scenarios.
The issue of product line pricing has been addressed by many researchers. 7 Game-
theoretic analysis has focused on the pricing of related or substitute goods. This work
often ignores channel structure issues, and is typically restricted to a small set of
brands. An exception to the tendency by empirical researchers to ignore the role of
channel pricing, is Zenor (1994), whose model we discuss at the end of this section.
One of the early multiproduct models, in which interdependencies between brands
belonging to a firm's product line are considered, was developed by Urban (1969b).
Urban specifies demand- and cost functions for a firm offering brands in several prod-
uct groups h = 1, ... , H. We modify Urban's model slightly for convenience. Sales
of brand j, one of the brands of product group h, is modeled through the demand for
product group h, with interdependencies between the product groups, and the market
share of brand j in the product group, as follows:
r C'' r
qjht a . Peih Grht drht (13.10)
nH
.
h'=!
h'f=h
-f3Ih'
ph't
C"'L '
r=l
arh't L
r=l
drh't
ming the products of the brands' unit sales and prices. The cost function captures
interdependencies between the different brands as follows:
(13.11)
where
TVCjht total variable cost of producing brand j (of product
group h) in t,
AVCjht the average variable cost for brand j in h if produced
independently of other products (brands) of the firm,
the quantity of brand r' (of product group h') produced by the firm,
cross variable cost elasticity of brand j with respect
tor'= 1, ... , H', r' f= j.
It is assumed that the firm produces one brand r', r' = 1, ... , H' in each of the
H' product groups, where H' :::;: H. Substracting TVCjt and fixed production and
variable advertising and distribution costs from the total revenue yields total profit.
Optimization of the profit function is performed by an iterative search routine. The
application of the model identified significant product (brand) interdependencies.
Based on an empirical application, Urban recommended changes in the marketing
mix of the products in the line so that the interdependencies could be exploited for
additional profit.
q =a +Ap (13.13)
8. Similar models were used by Bultez, Gijsbrechts, Naert, Vanden Abeele ( 1989) and Jubl, Kristensen ( 1989).
MULTIPRODUCT MODELS 261
where
q = n x 1 vector of sales for the n brands,
a = n x 1 vector of constant terms,
A = an n x n matrix of own- and cross-demand sensitivities
for all pairs of brands,
p = an n x 1 vector of retail brand prices.
Gross brand profits can be expressed as:
n = (diagq)'(p- c) (13.14)
where
c = an n x 1 vector of variable production costs associated
with each brand.
Zenor (1994) introduces the coalition-brand matrix Z, with elements Zij = 1 ifbrand
j belongs to "coalition" i, Zij = 0, if not. If a manager pursues category management,
it creates a formal "coalition" between the brands in its product line. Thus if brands
j = 1, .. , 4 constitute a formal coalition under category manager i = 1, and brands
j = 5, 6 are excluded, then zn = ZI2 = Zl3 = ZI4 = 1, and Zl5 = ZI6 = 0. Equation
(13.14) specifies the gross profits resulting from treating the brands independently.
The "coalition" profits are obtained by pre-multiplying (13.14) by the coalition matrix
Z:
Zenor's empirical analysis shows the benefits of coordinated decision making: the
implied profit increase is as high as 30 percent.
The marketing problem for a retailer is enormously complex. For example, many
supermarkets offer tens of thousand of items. The retailer faces not only the question
of which items to offer and in what manner (content and layout of the assortment) 9,
but also what prices to charge, how to advertise the outlet and the items, and how
to conduct promotions. If we treat items and product categories independently, then
these problems can be analyzed separately for each of many components. However,
the most intriguing question centers around dependencies. Thus, supermarket retail-
ers use certain items (e.g. fruits such as bananas, soft drinks such as Coca Cola) to
attract consumers to the store by offering these items at promoted prices. If this is a
productive strategy, then the promotion of these items at low prices must have positive
effects on the sales of other items. Relatively little is known about the effectiveness
9. See Rao and McLaughlin {1989) for an analysis of the retailer's new-product acceptance/rejection decision.
262 CHAPTER 13
of alternative approaches used by retailers to affect total store sales. Thus, there is
a great deal of uncertainty about the optimality of marketing decisions at the retail
level.
According to Bultez and Naert (1988b), consultants to retailers essentially rec-
ommend shelf space allocations proportional to revenues or profits, potentially aug-
mented by cost considerations such as handling costs and inventory control costs.
By contrast, academic researchers have considered the sensitivity of demand to shelf
space. It is easy to demonstrate that the optimal amount of space for a given item is a
function of own- and cross-item demand sensitivities.
Bultez and Naert (1988b) have developed SH.A.R.P. (SHelf Allocation for Re-
tailer's Profit) to capture demand interdependencies. In this sense shelf space alloca-
tion models constitute an important part of multiproduct models. 10 The objective is
to maximize profit across n items:
Max {
sh,
t
r=l
g,q, (sh t. ... , shn) - t
r=l
C,} (13.16)
subject to
(13.17)
and
In the model each item's sales volumeq, is a function of the space allocation, 11 i.e. the
sh, 's. Since an increase in shelf space should increase sales, the expectation is that
8q,/8sh, ~ 0, and hence TJrr = (8q,j8sh,)(sh,jq,) ~ 0 (i.e. the shelf space elas-
ticity is expected to be positive).l 2 The cross elasticities TJrj = (8q,j8shj)(Shj/q,)
are either negative (substitutes), positive (complementary items) or zero. Importantly
the interdependencies are captured through these cross effects.
10. For literature reviews see Corsgens, Doyle (1981), Bultez, Naert (1988b) and Bultez (1995).
II. We closely follow Bultez, Naert (1988b).
12. See also Curhan (1972).
MULTIPRODUCT MODELS 263
On the cost side, the expectations are that: 8C, /8q, ::: 0, i.e. higher sales volume
implies extra replenishment operations, holding other aspects such as shelf space
constant, and (8C,/8sh, I dq, = 0) ::::; 0, i.e., more shelf (storage) space reduces
replenishment operations, when the demand stays constant.
The Lagrangian function for the nonlinear programming problem defined in
(13.16)-(13.18) is:
with
The optimal share of the total available space to be allocated to item r is:
Bultez and Naert ( 1988b) suggest a pragmatic hierarchical approach which first dis-
tributes the total shelf space SH across the various product classes (allocation along
the width-axis of the assortment) and then allocates space to the items within the lot
assigned to each product class (allocation along the depth-dimension). The model of
264 CHAPTER 13
Corstjens and Doyle (1981) focuses on interactions between product classes, while
SH.A.R.P. includes interactions between product classes and interactions between
items within a product class. Bultez and Naert (1988b) use attraction models toes-
timate shelf space elasticities at both levels. In the empirical section they report that
SH.A.R.P. was applied to multiple assortments in Belgian supermarket chains. Re-
allocations of shelf space resulted in increases in profit varying from about 7 percent
to 34 percent.
The attractiveness of a product class depends on the attractiveness of the items
within the product class and the shelf space allocated to these items. Hierarchical
models offer good opportunities to relate these two levels of attractiveness. For a
discussion of these models see Section 14.5. Bultez, Gijsbrechts, Naert and Van-
den Abeele (1989) integrate another, asymmetric, attraction model in SH.A.R.P. We
discuss this model, which relates the interactions at both levels, in Section 14.4.
In this section we present and discuss a model designed to optimize the allocation of
advertising expenditures to individual product categories and media for a retailer.
In the model, the sales in a given product category is assumed to be a function
of advertising for that category, advertising for other categories, and general store
campaigns.
Doyle and Saunders (1990) propose a multiproduct advertising budgeting model
that can be used to allocate expenditures to different product categories and media. We
first discuss the basic elements of their model and the attractive aspects with respect
to specific synergies captured in their approach. We then suggest opportunities for
additional research.
Doyle and Saunders introduce the problem of determining the optimal amount of
advertising by arguing that for a single item the relevant factors are the item's margin
and the sensitivity of demand with respect to advertising (holding other marketing ac-
tivities constant). With multiple items, the retailer should also consider the cross-item
advertising effects (and the effects of advertising individual items) on store traffic.
Also, the support offered by the manufacturers plays a role.
In practice, advertising decisions are often made through what Doyle and Saun-
ders call a bottom-up approach. Essentially, the optimal advertising decisions made in
this approach ignore cross-item (cross-category) effects. Thus, possible complimen-
tary and substitution effects across items are ignored. To illustrate, we consider the
budgeting problem at the category level, consistent with Doyle and Saunders. In the
bottom-up approach, we have:
By summing (13.21) across the categories, j = 1, ... , n, we obtain total retail sales.
For an empirical analysis Doyle and Saunders propose a semilog 13 specification
and allow for lagged effects (which we ignore for convenience). The (simplified)
model specification at the category level in the bottom-up approach is then:
= cmj/31j *A (13.23)
L~=l cmrf31r
where
em j = the contribution margin of category j.
Doyle and Saunders improve on this simple scheme by allowing for complemen-
tarity (and substitution) effects in advertising for specific product categories (top-
down approach):
n
qjt = f3oj + L f3Ijr In art + u jt. (13.24)
r=l
* L~=l
aft~ n
cmrf3Ijr
n
Lj=l Lr=l cm,f3Ijr
*A. (13.25)
In this more general, top-down, approach the allocation rule reflects the own- and
cross-category effects. Thus, if the advertising for one category affects that category's
sales as well as the sales of other categories, then all those effects multiplied by the
corresponding contribution margins are taken into account. A practical difficulty is
that if we want to allow all categories to have cross effects, and we want to allow for
multi-period lagged effects, (13.24) is quickly saturated. And of course, the model
excludes price and other variables which may bias the estimated parameters. To over-
come these problems, Doyle and Saunders cross correlated sales and advertising (with
leads and lags up to four periods) after filtering the data through transfer function
analysis. 14 These cross correlations showed between four and ten predictor variables
(including current- and lagged own-category, and cross-category variables) for the
twelve categories of a European retailer with almost 1,000 stores, based on three
years of weekly data.
A variation of equation ( 13.24) was estimated, with a restricted set of advertising
variables (based on the significance of correlations between the filtered variables),
plus some seasonal and cyclical terms. In all cases, advertising expenditures were
defined separately for each of three media: catalogs, press and television. Seven of
13. To avoid the problem of taking logarithms of zero values, tbe value I is added to all advertising data (which
we do not show).
14. See Section I 7.3.
266 CHAPTER 13
the twelve product categories had been heavily promoted, three in catalogs, six in
the press, and three on television (one category was advertised in each of the media).
Because of this, only the combination of media and product categories with nonzero
budgets could be considered in the optimization process. However, the remaining
five product categories were occasionally featured in general store campaigns or in
general sales events. These five product categories' sales were potentially affected by
these general advertising variables as well as by advertising for other categories.
The existing allocation had almost half the advertising budget assigned to the
press, about 30 percent to the catalog and about 20 percent to television. The empiri-
cal results, however, showed that of 37 statistically significant advertising effects, 24
were for expenditures in the TV medium, 10 for the catalog medium, and three for the
press. Roughly consistent with this, the optimal allocation suggested that 65 percent
of the budget be allocated to TV, 30 percent to the catalog and about five percent to the
press. Among the categories, the optimal allocation was to have 40 percent for candy
(on TV), 20 percent for children's wear (in the catalog) and 10 percent for toys (on
TV). Interestingly, candy also had the largest actual sales amount, and it accounted for
more than 20 percent of total store sales. And the 40 percent recommended for candy
was heavily influenced by the large frequency with which it achieved a significantly
positive cross effect. Also, as expected, the TV medium had proportionally more
current (fewer lagged) effects than the other two media.
The optimal amount of advertising for the store can be derived by summing the
own- and cross-category effects multiplied by the corresponding profit contributions.
This calculation suggested that the budget be quadrupled ($32 million recommended
versus $8 million actual). Top management declined to accept this recommendation,
but the existing budget was reallocated largely consistent with the suggested percent-
ages. Doyle and Saunders report that both sales and profits improved, more than the
industry sector averages, and close to model predictions.
This application shows quite nicely how the allocation of advertising to product
categories and media can be improved, based on an analysis of available data. Import-
antly, the recommended relative allocations (but not the absolute ones) were adopted
by management and the predicted results achieved.
Nevertheless, it is instructive to consider possible model enhancements. For ex-
ample, although the profit equations included manufacturer support for advertising,
this support should be allowed to influence the effectiveness of the retailer's adver-
tising. Also, the model does not accommodate own- and cross effects of (category)
prices. Both the content of advertising and the categories advertised may influence
the price sensitivity (and the resulting margins used in the optimization procedure).
We also note that the allocation rule is based on the marginal effects of adverti-
sing on sales. If a given category has no significantly positive own- or cross effect,
the implication is that it be allocated zero advertising expenditures. However, a zero
marginal effect does not imply that advertising should be avoided altogether. In ad-
dition, given the dramatic change in allocation from previous practice, both to the
media and to the categories that had received advertising support, it is desirable to
allocate some funds to the five categories previously excluded.
CHAPTER 14
In this chapter we discuss specification issues that transcend the intended use of a
model (Chapter 8), the specific level of demand (Chapter 9), the amount ofbehavioral
detail (Chapter IO) and the specific models discussed in Chapters ll-13.
We first discuss issues with regard to the aggregation level at which a model is
specified, in terms of units and time periods (Section 14.1 ). The aggregation over time
follows the material on modeling marketing dynamics in Chapter 6.
Aggregation differs from pooling observations. In pooling we maintain the dis-
aggregate nature of the data, but use data on multiple units and multiple periods
to estimate, often, a common set of parameters. The differences are introduced in
Section 14.2. We consider pooling methods in more detail in Section 16.2.
Importantly, valid model specification depends critically on a proper definition of
the relevant market or the market boundaries. This issue is discussed in Section 14.3.
Related to questions about market definition is the topic of asymmetries in compe-
tition (see also Chapters 9 and 12). We discuss non-hierarchical models that accom-
modate asymmetries in Section 14.4, and introduce hierarchical models in Section
14.5. We compare hierarchical and non-hierarchical models in Section 14.6.
14.1.1 INTRODUCTION
267
268 CHAPTERI4
if a manager wants to use model results for pricing or advertising decisions that
apply to all households and all stores in a given market, the implication of this rule
would be that the model should be estimated with market-level data. Interestingly,
this seemingly sensible idea is not consistent with analytical and empirical research
results. Even if differences between households and differences between stores are of
no interest to the manager, it is helpful to use data that contain such differences.
To explore the pro's and con's of using data at alternative levels of aggregation,
we specify a household-level model and derive a store-level model. We also consider
the relation between store- and market-level models.
Data aggregation can occur over units and over time periods. Often, models are
specified for arbitrarily chosen periods of time. For example, market research firms
may decide to collect sales and associated data on a weekly basis. Here too the ques-
tion is whether, or under what conditions, the substantive conclusions are invariant to
aggregation over time.
We start with a simple example of entity aggregation along the model scope dimen-
sion. Consider the following market share model:
The differences between the estimated parameter-values of the four brands are sub-
stantial. We apply tests to see if these differences are statistically significant. To this
end we pool the data of the four brands and test the hypothesis of equality of intercepts
and slopes. 1 This hypothesis is rejected. Interestingly, the intercept and slope of the
firm share equation appear to be close to the averages of the coefficients for the brand
share equations. Nevertheless, the test result indicates that the brands differ in the
effects of advertising share and lagged market share.
A large part of recent empirical research in marketing science has focused on
models of brand choice specified at the household level (individual demand). 2 In
I. This so-called Chow-test is discussed in Section 16.2.
2. See, e.g. Chapter 12.
270 CHAPTER 14
a number of such models, the effects of marketing variables on brand choice are
determined conditional upon a purchase in the product category. One of the advan-
tages of household-level models is that heterogeneity between households, in brand
preferences and in sensitivities to marketing variables, can be explored. Understand-
ing this household heterogeneity is critical to marketing managers for the successful
development and implementation of market segmentation strategies. Mixture models
have been developed that account for this heterogeneity (Section 17.2). Even if the
model user does not want to exploit household heterogeneities in the use of marketing
activities, it may still be necessary to accommodate such heterogeneities in a model
intended only to produce the best average marketing effects.
If heterogeneity in the household-level model is ignored, it is possible to derive,
in a fairly straightforward manner, a specification that produces identical parameter
estimates at an aggregate level. Gupta, Chintagunta, Kaul and Wittink ( 1996) show
how a store-level model can be obtained that necessarily gives the same parameter
estimates as the household-level model does from which it is derived, as long as
the aggregation is restricted to households' purchases occurring under homogeneous
conditions. For example, actual brand choices occurring in a given store and a given
time period can be aggregated to the store level as long as the marketing variables
stay constant during the time period. Thus, if homogeneity in parameters across
households can be assumed, and if homogeneity in marketing variables across the
observations over which aggregation takes place is maintained, then an aggregate
demand model that is analytically consistent with an individual demand model will
produce identical results (Gupta eta!., 1996).
(14.5)
MODEL SPECIFICATION ISSUES 271
where
Parameter estimation occurs with maximum likelihood methods (Section 16.6). The
likelihood function (L), for an observed sample of purchases made by a household
shopping in store kin week t, can be written as (Gupta et al., 1996, p. 387):
L nnnnn~{,kt (14.7)
k j Ojkt
where
o0jkr = 1 if brand j is purchased on occasion o jkr,
0 otherwise.
If aggregation across household purchases occurs within store and within week (such
that the variables defined in xj are constant), then the same model structure applied
to store data that express the number of units sold for each brand (per store, per
week) produces identical parameter estimates. Thus, if the store data truly represent
all household purchases, and parameter heterogeneity across households is ignored,
we should be indifferent between the individual- and aggregate models.
In practice, only a sample of households participate in a purchase panel. Market
research firms such as IRI and ACNielsen invite a probability sample of households
to use plastic cards or wands so that the research firms obtain electronic records
of (selected) purchases. In practice, some households refuse to participate, and the
households who agree to participate may have omissions and errors in their purchase
records (see also Chapter 15). Gupta et al. (1996) find statistical support for the state-
ment that the panel household data are not representative of the store data. However,
a comparison of average estimated price elasticities for data collected in small towns
shows that the substantive difference between inferences from household- and store
data is small if the household data are selected in a manner that is consistent with
the model specification at the store level. Gupta et al. ( 1996) distinguish between
household- and purchase selection, and show that "purchase selection", whereby only
the purchases of brands selected for analysis are used for estimation, provides results
that are much closer to store data results than "household selection". In the latter case
only households that restrict their purchases to the subset of brands analyzed would
be included for parameter estimation of the brand choice model.
272 CHAPTER14
In this comparison the reduction in elasticities from unit sales to market share is
apparently much greater than the increase in elasticities expected when household
heterogeneity is accommodated.
Gupta et al. (1996) derived a multinomial store-level model that is based on the prin-
ciple of utility maximization. They show how results can be obtained from store-level
data consistent with multinomial brand choice model results. However, the commer-
cial practice favors traditional and relatively straightforward demand models of store
data. A popular market response function is the SCAN*PRO model (see equation
(9.19)). A simplified version is:
(14.8)
3. Their specification accommodates heterogeneity through a Taylor series approximation. If this approxima-
tion is close enough, and product category unit sales is constant over time, then their market share model will
produce similar results as the corresponding random-effects logit model. In such a model the parameters are
assumed to follow a continuous distribution for the sample ofhouseholds. For a different model that incorporates
household heterogeneity in the analysis of store-level aggregate data, see Kim (1995).
MODEL SPECIFICATION ISSUES 273
qk (14.9)
where
Fk = the featuring of a brand in store k,
measured by a dummy variable, k = 1, ... , K,
and the brand and week subscripts and the disturbance term have been dropped for
convenience. The linear aggregation of data across the stores produces the following
4. IRI and ACNielsen offer model results based on propriety household- and/or store-data to clients. The same
clients typically have to use data aggregated across stores if they want to do response modeling themselves.
274 CHAPTER14
averages:
(14.10)
where
F = the fraction of stores engaging in a promotion
activity, here featuring.
A similar, nonlinear, model structure for the resulting market-level data is:
_.!._ tqk = _.!._ takpfyFk "I= iX (..!.. t Pk)/3 y-k L.f=l Fk = apPy~". (14.12)
K k=l K k=l K k=l
On the other hand, if the market-level aggregates were to represent geometric means
of unit sales and price, then parameter equivalence obtains since:
UIl qk )1/K
K
=I
= (14.13)
Thus, if the multiplicative store-level model is the desired specification, then the
equivalent market-level model should be applied to data aggregated in a manner
consistent with the (nonlinear) model specification. Unfortunately, the market-level
data are produced primarily for tracking the market performance and the intensity of
marketing activities for the brands at the retail level. In other words only arithmetic
means are provided by the data suppliers.
To obtain analytical insight into the nature of a systematic difference in the pa-
rameters y andy' that is due to the linear aggregation of data belonging to a nonlinear
model, Christen et al. ( 1997) use a simplified store model:
(14.14)
From (14.14) it is easy to examine the bias that occurs in the market model due to
store heterogeneity in feature advertising.
The market model that corresponds to (14.14) is as follows (see also (14.11)):
(14.15)
MODEL SPECIFICATION ISSUES 275
1
y =exp
[lnq]
F . (14.16)
From (14.14), given that Fk is either 0 or 1, the correct expression for q is:
q= Fy + (1 - F). (14.17)
Substituting this correct expression into the formula for y 1, we obtain:
1
y = exp
[ln(Fy +F 1- F)] = - +l -
(Fy F)
IfF
, O<F::::L (14.18)
in estimated effects from market-level data can be very large. For example, for data
on peanut butter brands, the market model produced a feature advertising multiplier
=
(y') of 4.26, while the store model showed 9 1.36 (in both cases averaged across
three brands). Thus, the market model can generate a highly biased estimate of the
increase in sales due to feature advertising of a brand, holding other things constant
(more than 4 times the base amount of sales, if all stores feature the brand, in the
market model versus substantially less than 2 times in the store model). For display,
the market model had an average multiplier of 14.15, while the store model showed
only 1.80!
We conclude that for nonlinear market response models, a critical determinant of
aggregation bias is heterogeneity in marketing activities across the units over which
the data are summed or averaged. Thus, in case oflinear aggregation of the data (the
individual values are added or averaged, as is done for market status reports}, the use
of a nonlinear model will create biased parameter estimates if the marketing activities
differ between the units across which the aggregation takes place. Linear models are
not subject to this bias, because linear aggregation is consistent with the application
of a linear model.
There is, however, another source of aggregation bias which is also applicable
if the model is linear. 5 Krishnamurthi, Raj and Selvam (1990) show that if the units
over which aggregation occurs are heterogeneous in both marketing activities and in
the parameters representing the marketing effects, the requirement for no aggregation
bias in a linear model is zero covariance between the marketing activities and the
corresponding parameters. Is zero covariance a likely condition? We know that stores,
especially those belonging to different chains, exhibit a high degree of heterogeneity
in marketing activities. This heterogeneity may be based on positioning strat"gies.
For example, chains differ in the kinds of households they target, and this positioning
should result in different advertising, pricing and promotion campaigns for a variety
of brands. 6 Differences in marketing campaigns between the chains lead to differ-
ences in chain patronage (while at the same time differences in chain positioning will
be based on perceived differences between households). These arguments suggest
that zero covariance across stores between the marketing 'activities for brands and the
associated effects is a very unlikely condition. We summarize the various arguments
in Table 14.2.
These results indicate that if the primary purpose of the use of a marketing model
is to obtain valid estimates of the effects of marketing activities, it is important to
use data at the most disaggregate level. However, this does not necessarily imply that
disaggregate models will outperform aggregate models in forecasting aggregate-level
results. We briefly consider this issue.
Sales forecasts at (say) the market level can be obtained from models estimated at
different aggregation levels. One obvious possibility is a model based on market-level
data. Alternatively one can use chain-level data to estimate a model with separate (or
5. We emphasize that it is usually impossible to justifY the applicability of a linear model to real-world data.
6. The resulting heterogeneity in the execution of strategies across chains for a given brand may not be desirable
to the brand manager. Thus, some coordination between manufacturer and retailer is desirable.
MODEL SPECIFICATION ISSUES 277
Table 14.2 Aggregation bias in paramater estimates for linear and nonlinear models.
Homogeneous Heterogeneous
Homogeneous No bias Bias, if nonlinear model
(Christen et al., 1997)
equal) parameters per chain, and calculate market-level sales forecasts by aggregating
the chain-level sales predictions. A third alternative would be to derive the model sales
forecasts by aggregating individual store forecasts obtained from a store-level model,
such as, for example, relation (14.8). Differences in forecasting accuracy of models
defined at different levels of aggregation have been studied for a long time. 7 These
studies suggest that disaggregate models tend to outperform an aggregate model.
Foekens, Leeflang and Wittink (1994) compared the forecasting accuracy at the
aggregate chain and market levels for models defined at the store, retail chain and
metropolitan market level. They used the loglinear SCAN*PRO model 8 to study the
effects oftemporary price cuts, displays and feature advertising on a brand's unit sales
at the indicated levels. They found that substantive conclusions about the nature or
magnitude of these effects tend to be more valid at the disaggregate levels. For ex-
ample, store-level models have the highest relative number of statistically significant
parameter estimates in the expected range of values, followed by chain models which
outperform the market-level model. The store models with homogeneous (identical)
parameters across chains, provide the highest proportion of statistically significant
parameter estimates in the expected range of values. Heterogeneous store models
provide the best fit, followed by heterogeneous chain models. The heterogeneous
store models also provide the best forecasting accuracy results.
Aggregate models are more likely to be misspecified than comparable disaggre-
gate models, for at least two reasons:
7. See Grunfeld, Griliches {1960), Edwards Orcutt {1969), Aigner, Goldfeld {1973, 1974), Blinkley, Nelson
{1990).
8. See {14.8) and, for a more detailed description, Section 9.3.
278 CHAPTERI4
Some models have been developed that combine disaggregate and aggregate data.
For example, Russell and Kamakura (1994) use the substitution patterns observed
in a micro-level (household) analysis to assist in the estimation of a market share
model based on retail tracking data. The result is a market share model with a flexible
pattern of brand competition that is linked to the pattern of preferences observed in
the micro-level data.
Russell and Kamakura use the micro-level data to generate the total number of
market segments (S), the estimated volume-weighted choice probabilities E>s, the
estimated relative segment size RSs for each segment s and the probability 1fhs that
household h belongs to segments. This is done by latent class analysis (Section 17.2).
The Ss and Rss for each segment s are used in the macro-analysis. For this purpose
Russell and Kamakura ( 1994) define the intrinsic attraction ofbrand j within segment
s (IAjs) as:
=In(~)
E>s
(14.20)
where
Ss = the geometric mean of the purchase probabilities
ofbrand j, ejs. j = 1, ... ' n, in segments.
The macro-model specification is:
s
mjt = LRssmjst (14.21)
s=I
the price and promotion elasticities are segment-specific because of the differences in
the intrinsic brand attractions Yj +/A js across segments.
The arguments presented in 14.1.2 about entity aggregation also apply to aggrega-
tion over time. For example, for a nonlinear model, if only monthly or bimonthly
observations are available, and the marketing activities differ between weeks, it is
unavoidable that an aggregation bias occurs. However, in addition to such problems
there is an issue that stems from the inclusion of lagged variables to estimate the
duration of (manufacturer) advertising effects. Advertising effects on sales are not
just immediate for reasons given in Section 6.1.
Clarke (197 6) observed a systematic relation between the estimated duration of
advertising effects and the time period of the data: the longer the time interval, the
longer the estimated duration of the effect of advertising (in a linear model): see Table
6.1.
To show how this aggregation problem arises, we follow Leone (1995). A simpli-
fied linear model, constructed to estimate the current and lagged effects of advertising
for one brand on its sales, is:
(14.25)
what if only quarterly data are available? Then we could estimate parameters from
the following relation:
However, if ( 14.25) is specified on weekly data, and we aggregate this equation to the
quarterly time period, we obtain:
13 13 13
= L:ao- 'A)+ .8 L:>r +'A Lqr-1 + wr (14.27)
t=l t=l t=l
13
w1 = disturbance term = L(u1 - 'Aur-t).
t=l
The lagged term in (14.27) represents the sales over 13 weeks but it is just one week
:!
removed from L 1q1 • This term would be the proper lagged variable. However, if
only quarterly data are available, it is not observable. In (14.26) qr-1 is lagged sales
for the previous quarter !
The proper specification at the quarterly time interval is:
Given that households differ in interpurchase time, having the shortest possible
intervals makes it more likely that the actual lagged effects can be correctly captured.
For additional treatments of this aggregation problem, see Kanetkar, Weinberg, Weiss
( 1986), Russell ( 1988), Srinivasan and Weir ( 1988), Vanhonacker ( 1988).
14.2 Pooling
The difference between pooling and aggregation can be illustrated with the models
introduced in Section 14.1.2. There we specified a firm share model (14.2), estimated
from 27 observations:
The equivalent pooled relation was estimated across the four brands with 4 x 27 = 108
observations, and resulted in:
The parameter estimates of the lagged endogenous variable, 0.64 in (14.29) and 0.99
(14.30), differ significantly. One possible reason for this is the contribution of cross-
sectional variation in ( 14.30). For example, a brand with high market share in t -1
tends to have high market share in t. Note also that the estimated standard errors
differ quite considerably. For example, the estimated effect of the lagged endogenous
variable has a standard error of0.16 in (14.29) but only O.oi in (14.30). Thus, pooling
data offers the possibility for greatly increasing the reliability of the estimates, pro-
vided that pooling is appropriate. Before a decision is made to pool, it is important
to test the homogeneity hypothesis (see Section 16.2). For these four brands, the test
result indicated that the homogeneity assumption is invalid (see Section 14 .1.2).
To define a brand's market share, it is necessary to know which other brands be-
long to the same market. Thus, while it is operationally simple to create a market
or industry sales figure, the problem is to define the relevant market or the market
boundaries. In some instances, the primary competitors are quite visible and easily
identified. However, in differentiated product markets brands compete potentially in
non-obvious ways. The definition of relevant market is also complex if the number of
competing brands is large, or if there are product substitutes that formally belong to
a different industry. And new entrants may not be easily characterized with respect
to a market in which it primarily competes. The complexity of the problem increases
further if more disaggregate units of analysis, such as stock-keeping units (SKU's)
are used as the basic unit of analysis (Fader, Hardie, 1996).
We briefly consider an example to illustrate the problem. In a study of the ce-
real market Ghosh, Neslin and Schoemaker, (1984) mention that over 140 different
brands of cereals were recorded by panel members. 9 To minimize the occurrence
of zero values for advertising in an attraction model, data were aggregated to 29
brands. Consumers' consideration sets are likely to contain a very modest number of
alternatives. Thus assuming that 29 brands directly compete with each other may not
be realistic. It will also be useful to narrow the set of brands for managerial use. Thus,
managerial use considerations and consumer aspects have to be weighted against
what is desirable from a modeling perspective (especially with regard to biases that
can result from improper aggregation, see Section 14.1).
Other relevant aspects include regional differences and distribution characteristics
which can be used to obtain a reasoned way of aggregating brands. To keep the num-
ber of alternatives modest, researchers often do something like the following. Create a
five-brand market: the brand, three major competitors, and a fifth item constituting the
remaining items or the "competitive fringe" combined. Although it is inappropriate
to assume that the brands combined into a competitive fringe are homogeneous, it is
often better to include those brands in this manner than not to include them at all. A
model based on such a five-brand market does retain the possibility of heterogeneous
parameters for the three major competitors. 10
In most industries, competitors are portrayed in terms of how intensively they
compete. Aaker (1995, p. 66) distinguishes(!) very direct competitors, (2) others that
compete less intensively, and (3) still others that compete indirectly but are still rele-
vant. 11 Aaker provides an unequivocal market definition which depends on a few key
variables. With respect to cola drinks, for example, key variables are cola!noncola,
dietlnondiet and caffeine/noncaffeine. 12 The attributes may be either concrete, as in
this example, or abstract. 13
Day, Shocker and Srivastava ( 1979) propose the following market definition:
"the set ofproducts judged to be substitutes within those usage segments in which
similar patterns of benefits are sought and the customers for whom such usages
are relevant"
(Day eta!., 1979, p. 10).
where
m jt = market share of brand j in t,
A jt = attraction of brand j in t,
C Si = the consideration set relevant to purchases of brand j,
{1, ... , n} = the set of alternative products (brands).
j = 6: cs6 = brands 1, 2, 3, 6
where
{1, ... , n} = { 1, ... , 7}.
Cross elasticities defined at the store or chain level can also be used to define compe-
tition between stores. Using weekly scanner data representing 18 product categories,
Hoch, Kim, Montgomery and Rossi ( 1995) estimated store-specific price elasticities.
They related these elasticities to a comprehensive set of demographic and competitor
variables that describe the trading areas of each of the stores. Hoch et al. find that cus-
tomer demographic variables such as age, household size, income and the percentage
of consumers who are ethnic, are much more influential than competitive variables
on price sensitivity. This result is consistent with the idea that stores attract different
types of customers who on average tend to be loyal to a primary store.
The (measured or estimated) transition probabilities (see Section 12.3) is a sec-
ond method. This method is used to define the degree of brand switching. In a strict
sense, transition probabilities that do not differ significantly from zero indicate that
the relevant pairs of brand do not belong to the same market or market segment.
Novak (1993) provides a survey of approaches for the analysis of market structure
based on brand switching data.
14. See also Hauser, Wemerfelt (1990), Roberts, Lattin (1991).
15. For details, see Batsell and Polking (1985). See also Chintagunta (1998), Park, Hahn (1998).
MODEL SPECIFICATION ISSUES 285
The analysis based upon customer judgments are complementary to the approaches
based on purchase behavior.
Decision sequence analysis considers protocols of consumer decision making,
which indicate the sequence in which various criteria are employed to reach a final
choice. Households often use stage-wise processing of information (phased decision
strategies) prior to purchase. For example, some alternatives may be excluded for
failing to meet minimum requirements on specific attributes, and the remaining set of
alternatives may be compared on the same or other attributes. The implication is that
at least two stages are involved: one in which options are eliminated and another in
which a choice is made from the remaining options.
In the consumer behavior literature a distinction is often made between processing
by brand and processing by attribute (Engel, Blackwell, Miniard, 1995, pp. 191-192).
This distinction is relevant, for example, to a determination of the order in which
product-attribute information is acquired or processed by consumers. 17 The usual
From a practical perspective, the existence of many items in a product category often
requires that researchers place restrictions on the nature of competition in consumer
response models. Thus, in a market with many items, the question is how, not whether,
asymmetry in competition should be accommodated.
In our discussion of alternative methods for the accommodation of asymmetry in
the market structure of a product category, we make a distinction between hierarchical
and non-hierarchical models. The non-hierarchical models are discussed in this sec-
tion, the hierarchical models are discussed in Section 14.5. The relation between the
models discussed in this section and Section 14.5 is shown in Figure 14.2 in Section
14.6.
ATTRACTION MODELS 19
Market share attraction models are examples of non-hierarchical models that can be
used to accommodate asymmetric competition. An example, a non-robust model, is
discussed in Section 11.2; see equation (11.16) and Table 11.3. In Section 9.4 (equa-
tions (9.33) and (9.34)) we discuss the Fully Extended Attraction model (FEA model)
which can account for asymmetry. We modify equation (9.34) to focus on the item
instead of the brand as the unit of analysis.
Suppose that a market consists of B brands, and that each brand is available in
S package sizes. Let each item be indexed by its brand (b) and its size (s). Also,
there are L marketing instruments. Then the FEA model has the following structure,
with the item as the unit of analysis (ignoring the error term), for s = 1, ... , S, b =
1, ... , B:
where
e amsbXfij
esb,ij =a " = ( ~~
f3esbij- LLmrcf3frcij
)
Xfij, (i,j) =I= (s,b).(14.33)
xe,Jmsb c=l r=l
Equation (14.33) indicates that the effect of marketing variable Xfij differs between
the items (i, j) and (s, b). Thus, equation (14.32) allows for asymmetric competition
between the items.
Even with a modest number of items the FEA model can easily pose estimation
problems. For example, assume the number of items (n =
B x S) is 20 and the number
of explanatory variables (L) four. Then the number of parameters in the FEA model
equals n + Ln 2 = 1620. For an estimation sample of one year of weekly observations
for all20 items there are 52 x (20-1) = 988linearly independent observations. Since
988 < 1620 restrictions on the nature of the competitive structure would be required.
Among the possibilities are:
1. specification of a "Cluster-Asymmetry Attraction model" (henceforth CAA
model);
2. the model developed by Carpenter, Cooper, Hanssens and Midgley ( 1988) (hence-
forth the CCHM model);
3. the specification of a hierarchical model (Section 14.5).
In the CAA model (Vanden Abeele, Gijsbrechts, Vanhuele, 1990) it is assumed that
the market can be structured as clusters of items. Criteria (e.g., brand name, pack-
age size, product form) are specified a priori such that the use of each criterion
results in the identification of one or more clusters. If clustering is effective, com-
petition between items within clusters is stronger than between items of different
clusters. The approach allows clusters to be overlapping, i.e. items may belong to
more than one cluster. Thus, the model allows multiple hierarchies to structure the
market simultaneously.
The CAA model incorporates cross-effects which are related to a priori defined
clusters of items by introducing one asymmetry parameter per clustering criterion.
Thus, asymmetric competition is modeled parsimoniously and requires the estimation
of only a small number of parameters.
We use brand name and package size as clustering criteria. A brand cluster con-
sists of all items with a given brand name. Thus, for B names we can define B brand
clusters. Similarly, for S package sizes we can define S size clusters. Note that an
item can be a member of two (or more) clusters. The CAA model has the following
specification:
(14.34)
MODEL SPECIFICATION ISSUES 289
GACA (see Vanden Abeele et al., 1990). The above definitions hold for brand clus-
ters. Similar definitions can be provided for the size clusters.
The CAA model has been incorporated into the SH.A.R.P. model (Bultez and
Naert, 1988b), which is discussed in Section 13.4. Incorporating the CAA model in
SH.A.R.P. offers opportunities to integrate the diversity of substitution effects that
may stem from brand loyalty, preference for a specific variety or package-size, and
purchasing habits. The model that accounts for asymmetric cannibalization in retail
assessments is described in Bultez, Gijsbrechts, Naert and Vanden Abeele (1989).
CCHMMODEL
where
The estimation of the model in (14.35) tends to require a much smaller number of
parameters than is required for the FEA model (14.32).
MODEL SPECIFICATION ISSUES 291
We define:
qsb,t = unit sales (e.g. number of kilograms)
292 CHAPTER 14
Figure /4./ Two alternative hierarchical structures with respect to the choice of brands (A, ...,
G) and package sizes (1, ..., 5).
where mb,t and mslb,t are the market share for brand b, and the conditional market
share for sizes, given brand b, respectively.
Let Cslb denote the set of package sizes which have marketing instruments with a
cross-effect on the share of sizes (given brand b). Analogously, let Cb denote the set
of brands which have marketing instruments that have a cross-effect on the market
share of brand b. Then the FENMNL model can be formalized as follows:
where
MODEL SPECIFICATION ISSUES 293
where
The inclusive value variable is the log of the sum of the attractions of all package sizes
belonging to brand b. The term involving the unknown parameter (1 - ab) measures
the effect of the total attractiveness of the set of package sizes of brand b on the
brand's market share. In the FENMNL model the value of ab can vary between zero
and one.
Several hierarchical models are nested in the FENMNL model, for example:
• the FEMNL model: the "Fully Extended" MultiNomial Logit model, for which
the condition holds that ab = 0 for all b, i.e. the inclusive value variable has .full
impact;
• the FESQA model: the "Fully Extended" SQuared Attraction model, for which
ab = l for all b, i.e. the inclusive value variable has no impact. In this model it is
assumed that the share models for different stages are independent of each other.
The FESQA can be interpreted as the product of two attraction models.
Like the FEA model, the ("fully extended") FENMNL model has its (more restric-
tive) "extended" counterpart: the ENMNL model. In this model the unique cross-
competitive effects are eliminated. In that case the attractions have the following
structures:
Aslb.t + X~b.tf3sbsb]
expiJJosb (14.41)
Ab,t = exp[aob + Y~. 1 a!bb + (1- ab)IVb,t1 (14.42)
where
• the EMNL model: the "Extended" MultiNomial Logit model for which ab = 0
for all b;
• the ESQA model: the "Extended" SQuared Attraction model for which ab = 1
for all b.
In Figure 14.2 we show the relations between the models discussed in the preceding
two sections. The figure includes models not discussed in the preceding sections but
which connect other models or are simplifications. 20
In Table 14.4 we identify characteristics on which hierarchical, CCHM and cluster-
asymmetry attraction (CAA) models differ. To facilitate the discussion ofhierarchical
model characteristics, we use the FENMNL model as its most general case. We com-
pare FENMNL against the two other models that embody other forms of restrictions
on the competitive structure in markets with many items. We identify differences in
the a priori structuring of the market and the sources of asymmetry in the first two
aspects shown in Table 14.4.
The third point of difference is the number of parameters in each model. The
CCHM model potentially requires a very large number of parameters, while this num-
ber is small for the CAA model. The FENMNL model has an intermediate position in
this respect. For example, for B = 5, S = 4 (hence n = 20) and L = 4, the maximum
numbers of response parameters are: 1600 (CCHM), 420 (FENMNL) and 82 (CAA),
respectively.
Importantly, the implementation of the CCHM model and the hierarchical models
differs in the manner in which potential cross-effects are identified. In the CCHM
model cross-effects are identified from an analysis of residuals from the EA model.
A variable with a potential cross-effect is added if the share residuals of an item are
significantly correlated with that variable. If there is only one relevant cross-effect
(i.e. it is the only missing variable in the EA model) the resulting simple correlation
coefficient between the variable and the residuals should be statistically significant.
However, if two or more cross-effects exist, the simple correlation coefficients may
not be a valid basis to identify the group of relevant predictor variables. For example,
it is possible that the simple correlation between such a variable and the residuals is
nonsignificant due to the omission of another relevant variable.
In the hierarchical (FENMNL) model we can incorporate all possible cross-effects
(for the items included in that branch), as long as the number of items within a given
branch is small. For branches with a large number of items, Foekens et al. ( 1997) use
Haugh-Pierce bivariate causality-tests to identify potential cross-effects (see Pierce
and Haugh, 1977). 21 That is, the relation is tested separately for each marketing vari-
able of(say) item (i, b) and the (conditional) market share of item (s, b), i I= s. As
a consequence, the identified group of potential "causal" variables may also exclude
some variables which in fact have a causal relationship with the dependent variable.
However, the causality test is not always needed.
It should be clear from these comparisons and from the relations between all
specific models shown in Figure 14.2 that there is no straightforward basis for pre-
Hierarchical models
Nonhierarchical models
Models: FENMNL = Fully Extended Nested MultiNomial Logit; ENMNL = Extended Nested MultiNomial Logit;
ENMNL* = a specific version of the ENMNL model which connects ENMNL with a specific version of CAA; FEMNL
= Fully Extended MultiNomial Logit; FEMNL * = a specific version of the FEMNL model which connects FEA with
FEMNL; EMNL = Extended MultiNomial Logit; FESQA = Fully Extended SQuared Attraction; ESQA = Extended SQuared
Attraction; FEA =Fully Extended Attraction; EA = Extended Attraction; CCHM = FEA developed by Carpenter et al.; CAA
= Cluster-Asymmetry market share Attraction; (ATCA) = (Arithmic Total Cluster Attraction: a specification of the CAA
model); (GTCA) =(Geometric Total Cluster Attraction specification).
Model references: FENMNL; (14.37)-(14.39), ENMNL: (14.40)-(14.41); FEA: (14.32); CCHM: (14.35); CAA: (14.34).
Restrictions: RO: set all cross-effect or asymmetry parameters to zero; Rl: set O!J = 0 for all brands b; R2: set ab = I for
all brands b; R3: use a subset of cross-effect parameters based on residual-analysis; R4: item's attraction includes cross-effect
parameters which relate to the item's "cluster-members" only; R5: exclude all cross-effect parameters between items of different
brands; R6: setO, = 0, and use ATCA specifications; R7: use GTCA specification; R8: set all brand-level parametes to zero and
let ab = a for all brands b; R9: set e8 = -a.
ferring one of these models in an empirical application. We could suggest that the
CAA model is attractive for (large) markets in which: ( 1) multiple market struc-
turing criteria are simultaneously active, and (2) the competition is dominated by
one marketing instrument. With only one instrument there is no concern about the
assumption that asymmetries are not instrument-dependent. Similarly, we expect that
the attractiveness ofhierarchical models increases with the number ofitems. 22
22. A preliminary comparison of the different models based on data for one product category reveals that the
ESQA-model has the best predictive validity and is second best on face validity. See Foekens eta!. (1997).
PART THREE
Parameterization and validation
CHAPTER 15
Organizing data
As indicated in the discussion about the model-building process in Section 5.1, the
step that logically follows model specification is parameterization. Model specifica-
tion forces the decision maker to be explicit about which variables may influence
other variables, and in what manner. In the specification stage, variables are catego-
rized (e.g. criterion versus predictor) and operationalized. Also, plausible functional
forms have to be considered. Sometimes data are available or can be obtained without
much effort. In other cases it may be necessary to develop specific instruments for the
measurement of variables.
Having "good" data is a prerequisite to meaningful, and hence implementable,
model building. In Section 15.1 we discuss what is meant by "good" data. Recently
marketing managers have experienced an explosion in the availability of data, es-
pecially scanner-based data. This explosion not only enriches the opportunity for
managers to understand markets through numerically specified models but it also
limits managers' abilities to track market changes in traditional ways. Essentially the
data revolution requires that managers use decision-support systems. We introduce
such systems briefly in Section 15.2.
We discuss well-known data sources, including scanner data, for the measure-
ment of performance indicators and variables that influence performances in Section
15.3. We describe the process leading to the definition of the required set of data in
Section 15.4. Data collection is the first step in parameterization. Another question
involves the selection of a technique for the extraction of model parameter estimates
from the data. The complexity of parameterization depends on appropriate estimation
methodology which in turn depends on the types of data, variables and models. We
discuss these issues in detail in Chapters 16 and 17.
Having "good" data is a prerequisite for meaningful model building. It is also es-
sential for all decision making in marketing. "Good" data encompasses availability,
quality, variability and quantity. These four elements determine the "goodness" of
the data. Since the purpose of model building is to learn about relations between
variables a critical question is whether the data allow for the estimation of inferences
desired by management. The answer to this question determines the "goodness" of
301
302 CHAPTER 15
the data, and it depends on the intended use of a model. The parameterization of
causal models used for description, conditional forecasting or prescription, requires
the availability of data of all relevant predictor variables. However, the estimation of
time-series models used, for example, for unconditional forecasting, is possible with
nothing more than time-series data on the endogenous (criterion) variable.
AVAILABILITY
The first condition for data is their availability. It is common for an internal accounting
department to have data on a firm's own actions and performance. Thus, data should
be available on unit sales, revenues, prices of the firm's products, advertising expen-
ditures, wholesale and retail margins, promotions and personal selling expenditures,
and market coverage. The availability of such data does not imply that the data are
directly usable. We return to this aspect when we discuss data quality.
Some data are more difficult to construct or to extract from company records than
is true for other data. And usually marketing models need to incorporate information
from other firms. For these reasons it is common for managers to purchase data, gath-
ered regularly or incidentally, from market research firms. IRI and ACNielsen, two
large research firms, provide market feedback reports on a large number of product
categories gathered through scanner equipment in supermarkets, drugstores, etc. Even
then the available data may be incomplete. Incompleteness in the form of missing
predictors is a problem since the estimated effects of the available variables will be
biased (if there is covariation between included and excluded predictor variables).
Dependent upon the precise purpose of the model-building effort it may be useful
to gather additional data. For example, only about 40 percent of the gourmet coffee
category was covered by scanners in the US in 1997. For a complete understanding
of the gourmet coffee market the commercially available data would have to be sup-
plemented, for example by surveys. It is now common for different data sources to
be combined. For example, sales data collected through scanning at the household
level may be combined with household level media coverage data. These data are
sometimes available from a single source, but quite often not. If different market
research agencies collect these different data types, techniques of data-fusion can
be generally employed. Initially, ad hoc procedures were used to match households
from two data bases on the basis of similarity in (often demographic) characteristics.
Nowadays, statistical procedures are available that, based on multiple imputations of
the missing data, yield fused data sets with known statistical properties (Kamakura,
Wedel, 1997).
Generally speaking, data on industrial markets are harder to obtain than consumer
data. Systematic records are rarely kept by individual firms. Although systematic
data gathering by commercial market research agencies is growing, it is often limited
to some large markets (such as electronic components). Customization in industrial
markets complicates the problem: products, prices and services are often customer
specific and usually not publically available, and the actions of competitors are often
unknown (Brand, Leeflang, 1994). For more details on data sources see Section 15.3.
ORGANIZING DATA 303
QUALITY
Here quality of data refers to the validity and reliability of data. A measure is valid if
it measures what it is supposed to measure. Even if a measure is valid it will in many
cases not be possible to measure it without error. The degree to which a measure is
subject to random error is assessed by its reliability. Errors in the measurement of
predictor variables can cause their estimated effects to be biased. It is in general quite
difficult to define appropriate measures for variables such as "product quality" and
"the value of a brand". 1 Much effort has been put into the development of appropriate
measurement scales. Handbooks of validated scales are available that inform market
research practice (Bearden, Netemeyer and Mobley, 1993). The validity of data used
to measure the effectiveness of such variables on sales or profit is, generally speaking,
not high. In addition, the validity of directly observable data available from, say, the
accounting department, may also be low (see Section 5.1 ). Furthermore, data obtained
from panels and surveys are subject to biases and sampling error.
VARIABILITY
If a variable shows no variation we cannot measure its impact on the criterion variable.
Relation (8.8) is an example in which price was excluded because the price data did
not show sufficient variability. The precision of an estimated effect for a predictor
variable depends on the amount of sample variation. We use the word "precision" to
describe the "estimated standard error" or (statistical) unreliability.
For models with multiple predictor variables, the "goodness" of the data also
depends on the amount of covariation between the predictors. Thus the precision of
the estimated effect of one predictor depends positively on its own variation and,
usually negatively, on the amount of covariation with other predictors.
The variability of marketing mix variables may be less than what is needed for
the estimation of their effects. For example, due to price competition the price range
observed in a market may be low, or the range of variation in a product attribute such
as package size is limited. In such cases the revealed preference data (choices) in
the market place may be supplemented with other data. Experiments may be used in
which desired variation in marketing mix elements is induced. A powerful technique
is conjoint choice experimentation in which product profiles are experimentally de-
signed, and choices among them are made by respondents (Louviere, Woodworth,
1983 ). Models are available to integrate such stated preference data with revealed
preference data (Morikawa, 1989).
Sometimes there are multiple sources of variation that can be used to infer rela-
tions. Scanner data are obtainable at the household-, store-, chain- and market-level.
Assume that for a given product category and a given metropolitan area one has access
to, say, 52 weeks of data on 60 stores. A critical question then becomes whether
both time-series (weeks) and cross-sectional (stores) variation are suitable for pa-
I. See. for example, Doyle ( 1989), Kamakura, Russell ( 1993), Keller ( 1993), Rangaswamy, Burke, Oliva
(1993), Simon, Sullivan (1993), Sunde, Brodie (1993), Agarwal, Rao (1996).
304 CHAPTER 15
QUANTITY
The final condition for "good" data is quantity. If a probability sample of observations
is used to estimate a population mean, the sample size influences the precision of this
estimate. However, if the interest focuses on the relation between variables, it is the
amount of variation in a predictor (as well as covariation between predictors) that
influences the precision of estimated effects.
The quantity of observations is, however, a critical factor for one's ability to
estimate all model parameters. Specifically, the number of observations has to exceed
the number of unknown model parameters. This requirement is simply a necessary
condition that needs to be met for all parameters to be estimable in principle. Many
researchers suggest that the quantity (number of observations) should be at least five
times the number of parameters. The advisability of this rule of thumb depends, as
we have argued above, on the amount of variation in and the amount of covariation
between the predictor variables.
It is also useful to reserve some data for a validation exercise. Model building
is an iterative process, and if one data set is used to estimate several alternative
specifications before one specification is selected, then the usual statistical criteria
are no longer valid. This type of estimation is often referred to as pretest estimation
(Leamer, 1978). If the purpose of the model building is to test theories, the final
specification should be validated on one or more new data sets. These other data sets
could represent different product categories, different regions, different time periods,
etc. If the model results are intended to be the basis for marketing decisions, an
appropriate test is whether the model predictions, conditional upon specific marketing
actions, outperform predictions from a manager (testing the model against managerial
judgment). For more on validation, see Chapter 18.
ORGANIZING DATA 305
If the initial quantity of data is inadequate for model estimation, the model builder
can consider one or more constrained estimation methods. For example, suppose one
wants to estimate the demand for one brand as a function of six marketing variables.
If there are nine other brands that belong to the same product category, and it is de-
sirable to have cross-brand effects for all brands and all variables, then the number of
marketing variable parameters is 60. Obviously one year of weekly data is insufficient
if the data are limited to time-series observations for one cross section. One solution
to the problem is to obtain time-series data for other cross sections. In that case the
issue of pooling cross-sectional and time-series variation has to be considered (see
Sections 14.2 and 16.2). An alternative solution is to employ constrained estimation.
For example, one could force some of the other brands to have zero cross effects by
eliminating those brands' variables altogether. Or one could combine some brands
and force parameters to be the same for brands belonging to common subsets. Note
that all model-building efforts involve simplifications of reality. For example, the
discussion above assumes that one desires to estimate all possible own- and cross-
brand main effects. This starting position involves the implicit assumption that all
interaction effects between the marketing predictors are zero.
The use of electronic means to record purchases in supermarkets and other retail
outlets has led to an exponential increase in the availability of data for modeling pur-
poses. McCann and Gallagher (1990, p. 10) suggest that the change from bimonthly
store audit data about brands to weekly scanner data (see Section 15.3) results in a
10,000 fold increase in available data (scanner data also measure more variables for
a larger number of geographic areas at the UPC/SKU-level). Assume a brand group
would normally spend five person-days analyzing one report based on bimonthly store
audit data. Analyzing means here detecting important changes, explaining changes,
discussing reasons for explanations, etc. This group would have to increase its size
about 1,000 times or become 1,000 times more efficient (or some combination of
these) in order to analyze weekly store-level scanner data in the same manner as the
audit data. 2 To prevent this explosion in manpower, much of the analysis of more de-
tailed and more frequently available scanner data has to become automated. One way
to accomplish this is through the development and application of marketing manage-
ment support systems. The following types of"computerized Marketing Management
Support Systems (MMSS)" can be distinguished: 3
A MKIS harnesses marketing-related information to facilitate its use within the firm
(Lilien, Rangaswamy, 1998, p. 315). A MKIS consists of a database (or databank4 )
with marketing data and statistical methods that can be used to analyze these data
(statistical methods bank4 ). The statistical analyses are used to transform the data
into information for marketing decisions.
A MDSS differs from a MKIS in that it contains a model base (modelbank4 ) in
addition to a databank and a statistical methods bank. The purpose of a MDSS can be
described as the collection, analysis, and presentation of information for immediate
use in marketing decisions. 5
A MKBS is a more advanced system than the previous two in the sense that theo-
retical knowledge and empirical generalizations are included. A restricted version of
MKBS is an "expert system" which is related to the concept of"artificial intelligence"
(AI). AI is concerned with the creation of computer programs that exhibit intelligent
behavior. 6 The program solves problems by applying knowledge and reasoning that
mimics human problem solving. The expert system approach is one of the earliest
techniques for the creation of intelligent programs.
Typically, an expert system focuses on a detailed description (model) of the prob-
lem-solving behavior of experts in a specific area. The label knowledge system is
more general because it encompasses and uses knowledge from experts but also other
knowledge sources such as information available from books and articles, empirical
results and experience. Nevertheless, the terms knowledge- and expert system are
often used interchangeably. 7 Examples of such systems are: 8
Expert systems have also been developed to generate management reports based on
statistical analyses of frequently collected marketing data. Examples of systems that
analyze scanner data are Cover Story (Schmitz, Armstrong, Little, 1990) and INFER
(Rangaswamy, Harlam, Lodish, 1991 ).
Mitchell, Russo and Wittink ( 1991) discuss the distinguishing characteristics of
human judgment, expert systems and statistical models for marketing decisions. They
argue that ideally systems are developed to collaborate with managers. The most
effective collaboration can be identified based on detailed knowledge of the advan-
tages and disadvantages of the alternatives. For example, humans cannot, on average,
beat models in repetitive decisions that require the integration of data on multiple
variables, often expressed on noncomparable scales. On the other hand, humans have
the intellectual capacity to construct theories and to provide arguments so critical for
model specification. They note that the role of systems as collaborators in managerial
tasks should be distinguished from the role of decision support systems. The role
of the latter is to provide input to decision-making but not to draw inferences or
make decisions. In a man-machine expert system, the system shares decision-making
responsibility with the user. Management support systems are effective in solving
problems that require serial processing, whereas the human mind is "designed" for
parallel processing tasks.
Wierenga and van Bruggen ( 1997) discuss extensions of the marketing knowledge-
based systems (MKBSs). 9 A marketing case-based reasoning system (MCBR) is
based on the fact that analogical reasoning is a natural way to approach problems. The
analogizing power of a decision maker can be strengthened by a MCBR, a system that
stores historical cases with all the relevant data kept intact. The ADDUCE-system
infers how consumers will react to a new advertisement by searching relevant past
advertisements. Thus, it can be interpreted as a knowledge system with a case-based
reasoning system as one of its components. This indicates that the different systems
are not necessarily distinct. Mcintyre, Achabal and Miller (1993) built a MCBR to
forecast the retail sales for a given promotion based on historical analogs from a case
base.
Neural networks can be used to model the way human beings attach meaning to
a set of stimuli or signals. Artificial neural networks can be trained to make the same
types of associations between inputs and outputs as human beings do. An important
feature of a neural network is its ability to learn. Marketing neural nets (MNNs) may
be useful for the recognition of new product opportunities and to learn to distinguish
between successful and less successful sales promotion campaigns.
Marketing creativity-enhancement programs (MCEPs) are computer programs
that stimulate and endorse the creativity of marketing decision makers. Because the
MCBR, MNN and MCEP were developed only recently, their value for marketing
management support cannot be evaluated yet.
9. We closely follow Wierenga and van Bruggen (1997). See also Wierenga, van Bruggen (2000).
308 CHAPTER 15
The support tools discussed thus far are defined from the perspective of a supply-
side approach. The descriptions emphasize features and capabilities rather than the
conditions under which they are appropriate. By contrast, Wierenga and van Bruggen
( 1997) introduce the concept of marketing problem-solving mode (MPSM), a demand-
side perspective to marketing support systems.
Both MDSS and MKBS require the collection and storage of (large amounts) of data.
We discuss important data sources in Section 15.3.
The sources of data for consumer durables, industrial goods and services are different
from those for frequently bought consumer goods. Although the models discussed in
this book are generalizable to all types of products and services, their actual use may
require extra work to identify appropriate data. In this respect we refer to Brand and
Leeflang ( 1994) regarding the research on modeling industrial markets and the special
issue of Management Science, November 1995, regarding the use of (OR) models in
service management. Here the use of judgmental or subjective data and subjective
estimation methods is relevant as we discuss in Section 16.9. In what follows we
discuss data for frequently bought consumer goods.
Data from the accounting department (internal data) and from independent marketing
information services (external data) are important sources of information for mar-
keting researchers and marketing executives. During the past few decades the use
of data for decision-making purposes has become widespread. The introduction of
ORGANIZING DATA 309
.-------,
_.,.. I Internal
~(3)
. -- data
-
I
-!- - ~
rl
I
Manufacturer
"
I Wholesaler I
L.i Retailer _I
- -"-data- - ,o-(2)
- Store
l
~
. - - ~
I Household
,. -Household
... - - - - ,
....__ __, data o-(1)
L .., ... ... .., .. ... ..1
Figure 15.1 Points in the marketing channel where measures are taken.
mathematical marketing models may also have stimulated the demand for and the
production of these data. In this section we discuss some well-known data sources. 10
Revealed preference data reflecting choices, sales and shares and a (sub)set of
causal variables can be measured at, at least, three levels in the marketing channel
viz:
• at the household level, for example, through a household panel whose purchases
are electronically recorded;
• at the retail level by means of a store audit or electronic (scanner-based) registra-
tion of activities;
• at the manufacturer level.
We show these levels in Figure 15 .1. Data obtained as "ex-factory sales" are internally
generated by the manufacturers. These can be corrected for changes in inventory at
the wholesale and retail level, so as to measure sales (as opposed to production).
We next discuss information available at the three levels, starting with the house-
hold level. This is followed by a focus on data available through electronic (scanner-
based) means, and a discussion of other sources of information on causal variables
(causal data).
I0. For a more extensive discussion see e.g. Malhotra ( 1996, Chapter 4).
310 CHAPTER 15
HOUSEHOLD LEVEL
Traditionally, and in some countries this is still the standard, information from house-
holds showing repeat purchase and brand switching behavior is obtained through
diary panels. 11 Such a panel consists of families who use a preprinted diary in which
they record their weekly purchases in specified product categories. Typically, the item
(brand name, type of product, weight or quantity, kind of package), number of units
and price along with store information are reported for each purchase. The families
are geographically dispersed, while the panel is demographically balanced so the data
can be projected to the national level in a given country. Families are recruited so that
the composition of the panel mirrors the population as much as possible on specified
characteristics. Panel members are compensated for their participation, often with
gifts. Families are dropped from the panel at their request, if they fail to return their
diaries, or if their records are unreliable. The diaries are returned weekly. Clients of
diary panel services use the data to assess among other things: the size of the market
for a product, the proportion of families buying over time, the amount purchased per
household, brand share over time, the frequency of purchase and amount purchased
per transaction, average price paid, etc. In some panels, the members are asked to
record whether purchases are on promotion.
In the US household panel services are provided by, for example, NPD Research,
Inc., Market Research Corporation of America (MRCA), Market Facts and National
Family Opinion. 12 In Europe these services are available from, amongst others, GfK
and Attwood.
For many decades, the ACNielsen Company dominated the industry consisting of
systematic data gathering at the retail level for the purpose of tracking the perfor-
mances of brands. The bimonthly audit data, still available in many countries all over
the world, are based on national probability samples. Auditors visit the same stores,
every two months. During the visit they take a complete inventory of all items covered
(for the product categories for which manufacturers are clients), and they record all
invoices since the previous visit. Total purchases by the retailer plus the reduction in
inventory provide the revenue (based on the shelf price prevailing at the time of the
audit) and unit sales information for each item per store. Other information collected
includes out-of-stock conditions (at the time of the audit), and certain promotional
activities such as premiums, bonus packs, sampling and featuring. A probability
sample of store data, weighted by certain store characteristics and previous period
results, produces highly accurate estimates ofbrand performance at the national level.
However, as the coverage of purchases by electronic means increases in a country, the
bimonthly audit service tends to disappear.
II. Other names for the consumer panel are diary panel (Malhotra, 1996, pp. 134-137) and mail consumer
panel (Hanssens, Parsons, Schultz, 1990, p. 63).
12. Green, Tull, Albaum ( 1988, p. 117).
ORGANIZING DATA 311
In Europe, GtK provides clients with sales, retail availability, prices, etc. of con-
sumer durable products. For other product categories not covered by ACNielsen,
Audits and Surveys' National Total Market Audit provides data in some countries.
MANUFACTURER LEVEL
The manufacturer's internal accounting system can provide "shipped sales" or "ex-
factory sales". To measure household purchases, these data have to be corrected for
changes in inventory at the wholesale and retail levels. Inventory changes at the retail
level are obtained through store audits; inventory changes at the wholesale level are
obtained from sales representatives who take on the role of intelligence gatherers or
from wholesale audits. These corrections of the "ex-factory" sales are performed a
couple of times per year. The ex-factory sales of a manufacturer's own brand can be
combined with the corresponding figures of other brands in a given product category
available from an independent institute. In that way, estimates of total industry sales,
industry sales per segment and brands' market shares are obtained.
EVALUATION
The precision of the data from each of these sources is not guaranteed. Yet, if house-
hold surv~y- or store audit data are used for marketing decisions, a certain rigor
must be imposed. 13 Problems in this realm have been identified by, for example, As-
sael (1967), Nenning, Topritzhofer, Wagner (1979), Shoemaker and Pringle (1980),
Leeflang, Olivier (1980, 1982, 1985) and Plat (1988).
Leeflang and Olivier observed substantial differences for measures of sales, mar-
ket share and price between the different levels in Figure 15.1. Nonresponse bias
is a major reason for problems with sample survey data. For example, households
who buy relatively inexpen.sjve brands have a higher response rate than households
who buy the more expensive brands. The nonresponse bias in store audit data results
from the refusal of some retailers such as discounters, to cooperate. Leeflang and
Olivier (1985) demonstrated that the data bias leads to large differences in marketing
decisions between consumer panel data and store audit data.
SCANNER DATA
The availability of data for parameterization of marketing models has increased dur-
ing the past decade through adoption of scanners by retailers. 14 In the early 1970s
laser technology in conjunction with small computers first enabled retailers in the US
to record electronically or "scan" the purchases made in their stores. Since then, after
a period of slow growth, the adoption of scanning has increased rapidly, and scanner
data have become available for decision support in marketing at many organizations.
Although scanning was originally used by retailers simply as a labor- and cost-saving
13. See Frankel and Frankel (1977).
14. See also Leeftang, Plat (1988) and Foekens, Leeflang (1992).
312 CHAPTER 15
North America
USA 14,660 55 18,530 62 71
Canada 1,011 38 1,390 45 50
Europe
Finland 306 15 1,252 45 80
Sweden 697 22 1,158 44 85
France 1,529 28 3,200 43 74
Great Britain 495 17 1,497 39 76
Denmark 220 15 850 37 83
Belgium 410 IS 686 31 83
Germany 985 10 2,730 29 39
Norway 311 15 537 26 58
Netherlands 385 13 638 25 56
Ireland 15 4 70 19 39
Italy 650 7 2,300 17 56
Spain 475 7 850 14 57
Austria 153 5 302 10 53
Switzerland 52 I 160 3 10
Source: Nielsen (1988, p. 39) and information from ACNielsen (the Netherlands) BV.
Scanning has many advantages for consumers, retailers, wholesalers and manufac-
turers. Scanning-based samples have been constructed at the household level and at
the retail level ("scanning-based store samples"). To provide a sense of the benefits
of scanning, we assume the use of scanning-based store samples. 15 The following
benefits can be distinguished:
• Greater accumcy, in the sense that much of the human element in recording prod-
uct movements is eliminated. Also, more accurate price data may be obtained as
variations in prices are known and can be related to the relevant quantities. This
does not mean, however, that mistakes do not occur. For example, checkout clerks
may skip items and computer records may be inaccurate (Foekens, Leeflang,
1992).
• Relatively low costs of data collection.
• Shorter data intervals. Since at the retail level it is rare for changes in prices on
promotion to be made within a week, scanner data are normally recorded on a
weekly basis. However, the data interval can be as short as a day, as it often is
in Japan. The shorter data interval provides insight into short-term fluctuations,
and it avoids summing or avemging across heterogeneous conditions (see Section
14.1.2).
• Exact data intervals. In the traditional store audits, the stores are visited on dif-
ferent days, resulting in a rolling sample. As a consequence, the aggregated audit
data can differ systematically from the true bimonthly measures. These differ-
ences are referred to as "instrument bias". This bias does not exist with scanning.
• Speed of reporting. Instead of four to eight weeks after the period of observation,
reports are available within days after the scanned period.
In scanner panels, each household may use an ID-card similar to a credit card. Panel
members are asked to present their card at the checkout counter each time they
shop. This allows each panel member's ID-number to be recorded next to the set
of items purchased by the member. 16 Such data from scanner panels are supplied by
IRI to clients through its Infoscan Service. Behavior Scan (from IRI) is a household
scanner panel designed to support controlled tests of new products or advertising
campaigns. 17 An alternative to the ID card is to ask households to scan at home or
at the store and to identify the store in which the purchases were made. This is done
by ACNielsen and GfK (Consumerscan and Microscan) both of which supply each
participating household with a wand.
The data from scanning-based store samples are known as "volume tracking
data". 18 The data provide information on purchases by brand, size, flavor or formula-
tion (Stock-Keeping Unit-level: SKU) and are based on sales data collected from the
checkout scanner tapes. Volume tracking data are supplied through Infoscan (IRI),
Scantrack (ACNielsen), Nabscan (The Newspaper Advertising Bureau) and TRIM
(Tele-Research, Inc.). The following measures are reported: 19
• volumes (at the SKU-level);
• revenues;
• actual prices;
• ACV = the All Commodity Volume of the store or store revenue;
• ACV Selling = the ACV for an item or group of items reported only in stores
with some movement of that item;
• baseline sales: an estimate of unit sales under non-promoted conditions.
Also, regular prices are estimated to make a distinction between those prices and
promotional prices which reflect temporary discounts.
These descriptions indicate how data on relevant decision variables are captured.
In addition the promotional environment in a store is measured through the separate
collection of information on displays and features. Merchandising is a generic name
for promotional activity conducted by a store to increase sales (Little, 1998). The
market research companies report four mutually exclusive types of "merchandising":
(1) display only, (2) feature only, (3) display and feature together, and (4) unsupported
price cuts. Several of these can be subdivided further, if desired, for example by type
of display. Most of these non-price promotional variables are collected as zero-one
measures. Measures of merchandising activity can also be defined analogously to
those for distribution.
16. Malhotra (1996, p. 137).
17. Cooper, Nakanishi (1988, p.96). See also Abraham, Lodish (1992).
18. Malhotra ( 1996, p. 137).
19. See, for example, Cooper, Nakanishi (1988, pp. 93-95).
ORGANIZING DATA 315
A more advanced system consists of the combination of scanner panel data with
cable TV advertising exposure records. The combination of information in a single
data set is referred to as single-source data. Single-source data provide integrated
information on household purchases, media exposure, and other characteristics, along
with information on marketing variables such as price, promotion and in-store mar-
keting efforts. 20 We mentioned in Section 15.1 that data fusion is desirable if all data
do not pertain to a single group of households.
CAUSAL DATA
Price-, promotion-, and distribution data are natural components 21 of the data collec-
tion methods just discussed; however, data on manufacturer advertising and product
quality are not. There are, however, agencies that specialize in the collection of adver-
tising expenditures, such as Leading National Advertisers (LNA) in the US. In many
models predictor variables such as gross rating points (GRP's) on TV-ratings are used.
These data are collected, for example, by Nielsen (Nielsen Television Index) 22 and
Arbitron (Arbitron monitors local and regional radio and TV diary panels). 23
Product quality is hard to define and to measure for use in a marketing model.
Some authors have used proxy variables such as the number of items in an assort-
ment. 24 Steenkamp (1989) provides an extensive overview of the measurement of
product quality.
In most models discussed in this book, the fundamental unit of analysis is the
brand. Given the wide range of assortments offered in many product categories, the
brand is a product line comprising many Stock-Keeping Units (SKU's). Most SKU's
can be described in terms of a few physical characteristics to distinguish the items.
Examples are SKU-attributes such as brand name, package size, "formula", flavor,
etc. Marketing research firms use several criteria25 to determine what can be treated
as an SKU-attribute: each attribute must be recognizable by consumers in an objective
manner (i.e. physically distinguishable), the variation in each attribute across SKU's
must be discrete, and each attribute must be applicable to every SKU. The analysis
challenge is to define the smallest set of SKU-attributes that capture the relevant vari-
ability across all SKU's within a product category for possible inclusion in a model.
The reason for minimizing the set is that the number of SKU-attributes and levels
quickly explodes for a complex product category.
Fader and Hardie (1996) calibrated a multinomiallogit model based on household
data as follows. The deterministic part of their model has two components:
• a preference component which represents the household's base preference toward
a SKU;
20. Malhotra (1996, p. 141}. See for applications of single source data Deighton, Henderson, Neslin (1994},
and Lodish et al. (1995a, 1995b}.
21. Hanssens et al. (1990, p. 64}.
22. See also Lodish eta!. (I 995a, 1995b).
23. See Danaher, Sharot (1994} for an application and an explanation.
24. See Telser (1962b), Leeftang, Koerts (1974), Leeftang (1977b).
25. We closely follow Fader, Hardie (1996}.
316 CHAPTERIS
Other data inputs refer to environmental variables such as size and age distribution
of the population, temperature, and macroeconomic variables such as gross national
product, per capita income and employment. 26 One can also use stated preferences
and other subjective consumer judgments.
The process of model development can help managers decide what information to col-
lect and it can also improve the use of existing information. 27 This holds in particular
for measurement problems and causal models estimated through LISREL (Section
17.1 ). In this section we illustrate how the development of a parsimonious model
contributes to the collection, organization and use of data for decision making. The
case study concerns the modeling of the sales performance of a Dutch detergent
brand when scanner data are unavailable. In many non-US-markets and especially
for industrial goods, services and some consumer durables this is a typical scenario.
26. For a survey of these secondary data services, see Malhotra (1996, Appendix 4a).
27. See indirect benefits 3 and 4 in Section 3.2.
ORGANIZING DATA 317
Box I
--.......
Box2
-----
Subset of data usable
for numerical
specification
Box 3
Box4
•
Plausibility of feasible
'
'
'
model specification( s) '
'
Box 5 t '
'
Definition of '
requirements for '
'
- - - " ~iat =--=.....:_- - - - - .
model building
'
Box6 i Box 7
------- t Box 8
We show a flow chart in Figure 15.2. Box 1 represents the original data base. The data
base is separated into two subsets: Box 2 contains the data useful for estimation; Box
3 contains data not of immediate relevance for numerical specification. 28 The avail-
able data can be the basis for an initial model specification (see Box 4). In fact, the
model builder is likely to face difficulties choosing just one specification, especially
if the data are incomplete (e.g. missing variables, missing details on some variables).
Thus, the initial effort may produce a set of alternative model specifications based on
available data.
At this point the model builder may decide that the available data are inadequate.
For example, if there are no data at all on a predictor variable that is believed to be
28. For example, the firm may collect data to monitor the performance of some brands in product categories in
which it is not competing, but may enter in the future.
318 CHAPTER 15
critical or the data on that variable are incomplete or of very poor quality, this is
a likely judgment. Alternatively, the model builder could estimate several specifica-
tions hoping that it is possible to obtain valid and reliable estimates of the effects of
available predictor variables. The same judgment of data inadequacy may occur if the
empirical results are unconvincing (e.g. the parameter estimates are implausible). In
practice this more often leads to respecification of the model than to the collection
of new data. Nevertheless, this initial modeling effort, both the model specifications
that can be estimated and the actual empirical results, can be instrumental in helping
the model builder identify the data that are required for successful completion of the
model building task.
The process leading to this definition of requirements for model building (Box 5)
is illustrated in Figure 15.2. Some of the data from Box 2 may be of use (Box 2a).
Often, however, new data are needed to supplement the original subset. Some of the
required data are obtainable on relatively short notice, for example, from market re-
search agencies. The remaining data in Box 3 may also be more relevant than initially
thought. The result is a new data set (Box 6). Some of the desired data will not be
immediately available (Box 7). But the data in Box 6 and Box 2a lead to a revised set
of models (Box 8).
We now take the initial approach used by an analyst working on behalf of a brand
manager. The brand manager had expressed an interest in a numerically specified
equation showing the effects of marketing variables on brand sales. The analyst ob-
tained access to a data base that led her to specify a linear market share- and a linear
product category sales model. Table 15.2 shows the subset (Box 2) from the original
data base pertinent to parameter estimation of the market share- (15.1) and product
category sales (15.2) equations. In the equations, bimonthly values for Ar-1 and Inc,,
are obtained through interpolation.
T = 29, R 2 = 0.56
I nc1 P, Ar-1
202- 0.29 - - - 29.5 - - + 0.31 - - - 0.29 t (15.2)
(41) (10.1) CPI, (11.7) CPI, (0.15) CPI1 (0.47)
T = 32, R 2 = 0.47
where
mjr lOOqjr/Qr,
n the number of detergent brands (see Table 5.2),
n* = the number of brands in the market consisting of deter-
gents and other cleaning products (see Table 5.2),
ORGANIZING DATA 319
Table 15.2 Original data base for model specification and estimation (Box 2, Figure
15.2).
Inc1rr = L~= 1 Incr national income in nominal terms per year t" N.C.B.s.e
We can criticize the specifications with or without reference to the numerical esti-
mates as follows. Both models assume that all effects are linear. This is undesirable
and unrealistic, as we discussed in the section on robust models (7.2.5). In the esti-
mated market share equation, if the advertising ratio is zero (no advertising for brand
j in t - 1), if the relative price is two (the brand's price is twice the average detergent
brand's price), and the market coverage is half, the predicted market share equals
-4.7, clearly an impossibility. Even if this combination of values for the marketing
variables is unlikely to occur, it implies that the linearity assumption is untenable.
Similar arguments can be made against the linearity of effects in the product category
sales equation. For example, detergent is a basic product that many households will
not want to be without (i.e. low substitutability). Also, there is only modest opportu-
nity to expand total product category sales. Under these conditions, product category
320 CHAPTER 15
1. There are many relevant predictors, each of which has a modest amount of (mar-
ginal) explanatory power; if only a few predictors are used, the total explanatory
power will be modest.
ORGANIZING DATA 321
2. Most relevant predictors have nonlinear effects; if linear effects are assumed, the
explanatory power will tend to be reduced.
3. The effects of many predictors depend on the level(s) of one or more other pre-
dictors (interaction effects); the omission of interaction effects tends to reduce the
explanatory power.
4. Measurement error in one or more predictor variables tends to reduce the explana-
tory power (and can cause the estimated effect(s) to be biased toward zero).
5. One or more predictor variables can have little variation in the estimation sample;
if all relevant predictors have no variation, it is not only impossible to estimate
effects, but the amount of variation in the criterion variable will consist of random
noise, such that the R 2 value will tend toward zero.
Based on the various arguments presented above, we can say that the estimated
equations give rise to the following desiderata.
a. Make the market definition 29 for the total advertising variable consistent with the
other variables. An effort needs to be made to define this variable for the detergent
product category (to reduce measurement error).
b. Obtain bimonthly data for the total advertising variable and for national income.
Having actual data avoids the interpolation that produces estimated values within
the half year for total advertising and within the year for national income (to
reduce measurement error).
c. Expand the set ofpredictor variables. For example, advertising expenditures can
be broken down by media (television, radio, press). Also, the product category
variables can be decomposed into separate variables for each other brand. Bultez
and Naert (1973) show that a superior model specification obtains if each com-
peting brand is treated uniquely. Also, the product category sales variable might
depend on the stock of automatic washing machines. Thus, there are many poten-
tially relevant predictor variables that can be added (to reduce omitted variable
bias).
d. Accommodate plausible interaction effects. A logical candidate is the possible
dependence of the market share effect of (relative) price on advertising (share).
The price effect may also depend on market coverage and on national income.
Thus, several interaction variables can be added (to reduce omitted variable bias).
e. Increase the number of observations. Obviously, many of these plausible model
additions cannot be accommodated without more data. One way to accomplish
this is to measure variables at a lower level of aggregation (e.g. stores, house-
holds), and a smaller time interval (e.g. weeks, days). A reduction in aggrega-
tion can avoid certain biases (Chapter 14). Both a lower aggregation level and a
smaller time interval will increase the number of observations which may also
increase the amount of variation in the predictor variables. In that manner, more
predictor variables can be accommodated so that the specification can become
more realistic.
Modifications of relations ( 15 .I) and ( 15 .2), possible after new data became available,
did not result in substantial improvements. One outcome of the critical examination
was that a possible nonresponse bias of the store audit data became of interest. In-
spection revealed that purchases in discount stores were not included in the audit.
Since brand j was hardly sold in discount stores, this meant that brand j 's market
share of the national market was overstated. This nonresponse bias could be re-
duced through the use of consumer panel data. These data also supplied information
about the stock of automatic washing machines and non-price promotions. A revised
numerical specification of the product class sales model is given in equation (9.17).
The "new data" (Box 6), and the data from Box 2 that remained, were used to
parameterize a revised set of market share- and product category sales equations (Box
8). The ratio of the explained variation over the total variation in the criterion variable
increased substantially for both models.
For more discussion of (more extensive) models relevant to this product category
based on weekly scanner data, see the discussion of hierarchical models in Section
14.5. The data used to parameterize these models constitute the last step of the data
collection process described here.
CHAPTER 16
In this chapter we consider methods and procedures for the estimation of model pa-
rameters. In Section 16.1 we provide a treatment of a simple linear relation between
one criterion variable and one predictor variable. Although such a specification lacks
practical relevance, it is attractive for a careful treatment of model assumptions, and
a conceptual explanation of the basis for the assumptions. And most of the principles
that apply to the linear model remain relevant as long as nonlinear effects for the
original variables can be accommodated by transforming variables (so that the trans-
formed variables are linearly related). In Section 16.1 we discuss the estimation of the
parameters of the linear model using either time-series data or cross-sectional data.
Estimation methods for pooled time-series and cross-sectional data are discussed in
Section 16.2. In Section 16.3 we introduce estimation methods that are suitable when
one or more assumptions required for the simplest methods are violated.
While in both Sections 16.1 and 16.2 we assume one-way causality, we consider
simultaneous causality in Section 16.4. In Section 16.5 we discuss situations in which
the model is not linearizable. Such problems require nonlinear estimation methods.
Specifically, we discuss maximum likelihood estimation in Section 16.6. This is fol-
lowed in Section 16.7 by nonparametric estimation which in its purest form allows
for complete flexibility in the nature of the effect of changes in a given predictor
variable on the criterion variable. Thus, nonparametric estimation avoids the problem
of selecting an appropriate functional form a priori. In addition, it accommodates
interaction effects between the predictor variables in a flexible way.
In Section 16.8 we discuss issues relevant to the estimation of models with be-
havioral detail, and in Section 16.9 we review subjective estimation. Subjective esti-
mation is attractive in the absence of data or when the quality of data is insufficient.
We note that our discussion is limited to standard procedures. More detailed treat-
ments can be found in econometrics textbooks. Goldberger (1998) provides a very
clear and lucid discussion. Berndt ( 1991) is quite accessible with attractive discus-
sions about research issues in economics. Baltagi ( 1998), Gujarati ( 1995), Intriligator,
Bodkin and Hsiao (1996), Johnston (1984), and Pindyck and Rubinfeld (1991) are
standard texts that contain relatively easy-to-follow treatments. Greene (1997) and
Judge, Griffiths, Hill, Liitkepohl and Lee ( 1985) are general state-of-the-art textbooks.
Amemiya (1985), Davidson and McKinnon (1993) and Hamilton (1994) provide
advanced descriptions.
323
324 CHAPTER 16
16.1.1 THETWO-VARIABLECASE
We assume the following relation for n cross sections and T time-series observations
per cross section:
We assume in this model that each cross section has unique parameters, and that these
parameters can be estimated from time-series data.
In ( I6.I ), fJ j is the systematic change in y jt when x jt increases by one unit, and
a j is the constant part of y jt when x jt equals zero.
Given sample data, t = I, ... , T for each cross section j, relevant to the prob-
lem for which a model is desired, the most common method for estimating the two
unknown parameters is Ordinary Least Squares (OLS). The objective is to obtain
parameter estimates, pj and aj' by minimizing:
T
L(Yjt- Yjr) 2 for each j (16.2)
I= I
where
By taking partial derivatives of (16.2) with respect to pj and aj, and setting those
equal to zero, we obtain:
T
a'Li=I(Yjr- Yjt) 2 = -2 L)Yjt- Otj- PjXjt) = 0. (16.4)
a&j
t=l
(16.5)
We could discuss error-term assumptions and derive estimators for this model. Since
( 16.1) is unrealistically simple we defer the discussion of error-term assumptions to
section 16.1.3.
16.1.2 THEL-VARIABLECASE
The basic model ( 16.1) can be written for the case of L predictor variables as:
For a model with an intercept term, we can specify XJjt = 1 for all jt, in which case
~Ij is the constant term. For a given cross section j, the relations can also be written
as:
~lj]
X!jl X2jl
Yjl]
Yj2 XJj2 X2j2 XLjl ]
XLj2 [ {hj [ Ujl
Uj2 ]
... . + .
[
..
(16.8)
[
..
YjT XJjT X2jT XLjT ~Lj UjT
326 CHAPTER 16
Yj = Xj{Jj + Uj (16.9)
where
Yj = a column vector ofT values for the criterion variable,
X j = a matrix of order T x L with values taken by the
L predictor variables x 1j, ... , x Lj,
fJ j = a column vector of L unknown parameters, and
u j = a column vector of T disturbance terms.
If observations on Xfjt, l =
I, ... , L andy jt are available over time for a given cross
section, we speak about time-series data. Alternatively, (16.7) can also be specified
for a given t with cross-sectional observations:
Thus in (16.10) each time period has unique parameters which would be estimated
from cross-sectional data.
Finally time-series and cross-sectional data can be used together which results in
pooled time-series and cross-sectional data. We discussed this in Section 14.2, and
we elaborate on it in Section 16.2.
Parameter estimates for (16. 7) and (16.10) are obtained analogously to the process
shown in (16.2)-(16.6). Thus, the least-squares estimates of the parameters fJij, ... , fJLj
in (16.7) are the values Pij •... , PLj which minimize the sum of the squared values
of the residuals ujt, t = I, ... , T, for a given cross section j:
2
RSSj = t u J 1 =t (Yjt- t)tjXfjr) (16.11)
1=1 1=1 (=1
where
RSSj = residual sum of squares for cross section j.
Differentiating RSSj with respect to pj and setting the derivatives equal to zero, we
obtain:
(16.13)
or
(16.14)
ESTIMATION AND TESTING 327
This expression for the ordinary least squares estimation of fJ j is similar to the corre-
sponding expression of ~j in the two-variable case (16.6). 1
For statistical interference about the parameters of the linear model we also need
a specification of the probability distribution of the disturbance terms. We discuss
such a specification and assumptions about the disturbances in 16.1.3.
L(Yi- M2 • (16.16)
i=l
I. We assume that X j has rank L (or Xj X j is nonsingular}, and therefore its inverse (Xj X j) -I exists.
328 CHAPTER 16
(16.17)
And
(16.18)
Thus, we have derived the sample mean as the least-squares estimator of the popula-
tion mean. And, given random sampling, we can claim the following:
y rv Normal(J-L, a 2 IN).
We note that the formula ( 16.18) for the arithmetic mean of the sample, y, was in
fact obtained by requiring that Mminimizes the sum of the squared deviations. Thus,
the well-known arithmetic mean is also the "least squares" estimate of the unknown
population mean.
distribution applies (for which we can usually make reference to the Central Limit
Theorem), while in regression every assumption about the error term needs to be
explicitly considered. With regard to these assumptions, we can show, relevant to
( 16.6), that:
I
ifE(uj 1 ) =Oforalljt, (16.19)
if Var(u jr) =a} for jt,
and Cov(u il• u jt') = 0 (16.20)
for jt =I= jt 1 •
2. For each cross-sectional unit j, the disturbance term has the same variance:
3. The disturbance terms are uncorre1ated over time (zero autocorrelation) within
each cross section j:
f3i = I
(XjXj)
-1 I
XiYi.
Thus:
(16.25)
under the assumption of zero expectation for the disturbance term and the assumption
that X j is a nonstochastic matrix (X j = "fixed").
The variance-covariance matrix of pj in (16.14) is derived as follows:
or
(16.26)
In practice, researchers often pay insufficient attention to the validity of the assump-
tions about the disturbance term. Relatively little time is usually spent on model
specification (see Chapters 5-14), especially with regard to the relevance, the mea-
surement and the nature (functional form, interactions) of the effects of predictor
variables. And it is convenient to assume that the disturbance term is "well behaved"
for statistical inference. Thus, users of regression analysis often (implicitly) assume
that predictor variables' effects are linear, that predictor variables do not interact, that
predictor variables are truly exogenous and that all relevant variables are included in
the specification and are measured without error.
Unfortunately, the reality is very different from this ideal set of circumstances.
We rarely have access to all relevant predictor variables, it is common for variables
to be subject to measurement error, and the assumption of one-way causality is of-
ten questionable. Of course data quality problems (see Chapter 15) may prevent the
model builder from verifying preconceived notions about the nature of some effects.
For example, if a given predictor variable has minimal variation in the sample data,
we may not be able to demonstrate the (empirical) superiority of one functional form
derived from theory over another. Thus, data quality limitations may prevent the
model builder from exploiting marketing knowledge to the fullest extent. However,
empirical researchers also lack an incentive to do sufficient testing. One reason is that
ESTIMATION AND TESTING 331
most model users have incomplete knowledge about the role error-term assumptions
play in statistical inference. Thus, the model builder needs a more critical user. An-
other reason is that the rejection of one or more error-term assumptions may make it
difficult or impossible for the model builder to complete the task of model building.
By not examining the assumptions critically, the model builder can pretend that the
estimated equation is useful.
We note that an estimated equation can be evaluated in terms of usefulness by
considering its implications under a variety of conditions. For example, we could
reject a model that produces predictions that are inherently implausible (see Chapter
15). But if we criticize someone 's estimated model for having implausible implica-
tions under extreme conditions, the counter argument is often that the result should
not be used outside the range(s) of variation in the sample data. The difficulty with
this counter argument is that such restrictions about a model's applicability are rarely
made explicit. It seems prudent, therefore, to use a ("robust") model specification that
allows [Qr the widest possible applicability.
In Table 16.1 we show possible reasons for violations of each of the assumptions,
the consequence of this violation for parameter estimates, how each violation can be
detected, and the available remedies.
NONZERO EXPECTATION
Violation of the first assumption, i.e. E (u j I X j) I= 0, is the most serious one. One of
the principal desiderata of parameter estimates is unbiasedness (or consistency). By
analogy, if we desire to estimate an unknown population mean value, any probability
sample will guarantee that the sample mean is unbiased. In regression analysis the
unbiasedness property obtains if the model is correctly specified. All relevant pre-
dictor variables should be included, the proper functional form of the partial relation
with respect to each predictor must be accommodated, etc. Misspecification of the
model causes the parameter estimates to be biased. For example, an omitted predictor
variable causes the parameter estimates to be biased, unless the omitted variable is
uncorrelated with the included predictor variables. The amount of the bias increases
with the degree of (positive or negative) correlation.
Violations of the first assumption are rarely detectable from a plot of the residuals,
u jt = (y jt - yjt), t = 1, ... , T, against each predictor variable. However, if only the
assumed functional form is incorrect, such a plot should show a systematic pattern
in the residual values. On the other hand, this plot will not suggest that a relevant
predictor has been omitted (unless one has information about the values of the omit-
ted variable). Other reasons for violations of the assumption E (u j I X j) = 0 are
measurement error in one or more predictor variables and endogeneity for a pre-
dictor. Random measurement error in a predictor biases the estimated least-squares
parameter toward zero. Endogeneity makes the least-squares estimator biased and
inconsistent (see equation ( 16.157) in Section 16.6.3 for a definition of consistency).
We note that neither measurement error nor endogeneity is detectable from an inspec-
332 CHAPTERI6
Table 16.1 Violations ofthe assumptions about the disturbance term: reasons, consequences, tests
and remedies.
Violated assumption Possible reasons Consequence Detection Remedy
.1. E(uj 1) f= 0 or
E(ujiXj)f=O
- Incorrect func-
tiona! form( s)
- Biased
parameter
- Plot residuals
ag11inst each pre-
- specification
Modify the model
in
- Omitted vari-
able(s)
estimate"
-
dictor variable
Test explanatory
terms of functional
form
- Varying param-
eter(s)
power of pre-
dictor variable
- Add relevant
predictors
as polynomial to - Allow parameters
varying degrees to vary
(RESET-test)
- Whitetest
- Use Instrumental
Variables
5. Stochastic predic-
tor correlated with
- Measurement
errors
- See I. - Diagnose specifi-
cation
- Simultaneous
equations
the disturbance
term
- Edogeneity
of predictor
- Instrumental Vari-
abies
variable(s)
tion of the residuals in the original equation. Thus, the model builder must possess the
substantive knowledge that is critical for meaningful model building. For example, if
price and advertising are used as predictor variables in a demand equation, the model
builder must know how price and advertising. decisions are made, and whether this
decision-making process needs to be represented in a system of equations (see also
Section 16.4) to account for endogeneity in either of these variables.
To test for the possibility of omitted variables, the predictor variables can be specified
as polynomials to varying degrees. The explanatory power of extra terms is then used
as a basis for detecting the omission of relevant predictor variables. A formal statis-
tical test for misspecification in this context is the RESET-test2. The null hypothesis
of the test is (16.7) and the alternative hypothesis is E(u j I Xj) = ~j 'I= 0. The test is
based on the estimation of an extended model:
One remedy for a violation of the first assumption (i.e., if E (u 1 I X 1) =I= 0) is to find
instrumental variables Z such that E(uj I Zj) = 0, while the matrix Zj is highly
correlated with X 1, and the number of variables in Z1 is at least L. It is easy to demon-
strate that the instrumental variable estimator is consistent since E (u 1 I Z 1) = 0.
Specifically, suppose that only one predictor variable in the X 1 matrix is correlated
with the error term u1, say Xej. We then only need to find instrumental variables to
obtain xej such that Xej is uncorrelated with u1 but (highly) correlated with Xej. In this
regard note that the other predictors remain, since they are not correlated with u 1 and
are of course perfectly correlated with themselves. Thus for the other predictors we
use Xj(l) = Xj(l) (meaning for all predictors but l}, since the predictors themselves
are uncorrelated with u1 and there are no other variables more highly correlated with
Xj(l)·
We want to employ multiple instrumental variables, if possible, to create the
highest possible correlation between xe1 and Xej, subject to E (Xej I u1) = 0. This can
often be accomplished by using lagged values of the criterion and predictor variables.
Thus, let
Xej = ZjYj + Vj (16.28)
where
Xej the predictor variable that is correlated with the error term u1,
Z1 = a matrix of instrumental variables, possibly consisting of all
lagged y1 and X 1 variables,
v1 = vector of disturbances.
Then xe1 = ZJ(Z}ZJ)- 1Zjxe}· We replace the values for Xej in the Xrmatrix with
the values for xej. and apply least squares (this is the Two-Stage Least Squares or
ESTIMATION AND TESTING 335
2SLS estimator). For further detail, see for example Greene ( 1997, pp. 288-295). See
also Section 16.4.
HETEROSCEDASTICITY
The second assumption, that the error term is homoscedastic (i.e. it has the same
variance for all possible values of a predictor variable), is not nearly as critical as the
first. Its violation "merely" reduces the efficiency of (ordinary least squares) parameter
estimates. Thus if only the homoscedasticity assumption is violated, the least-squares
estimator is (usually) unbiased but does not have minimum variance. In addition, the
covariance matrix of the parameter estimates provides incorrect values. In many cases
the critical remedy is to use an appropriately adjusted formula for the variances and
covariances of the parameter estimates.
The benefit of using an estimator that incorporates the heteroscedasticity may be
in doubt. There are two relevant aspects to this. One is that the true source of het-
eroscedasticity is usually unknown. Thus, an expression has to be estimated from the
data which introduces uncertainty. The other is that in this case the theoretically supe-
rior estimator is only asymptotically more efficient than the least-squares estimator.
The benefit in practice, therefore, depends also on the sample size.
It is straightforward to encounter heteroscedasticity when the model is well spec-
ified. In practice, one can well imagine that the disturbance term in the model is, for
example, proportional to the value of a predictor variable, when the model is other-
wise properly specified. Experience in model building reveals that heteroscedasticity
occurs especially if cross-sectional data are used for estimation. Thus, heteroscedastic
disturbances have traditionally been accommodated in analyses of cross-sectional
data. In the literature, fairly complex tests for homoscedastic disturbances exist. We
restrict the discussion here to the basic phenomenon.
For heteroscedastic disturbance terms, consider a relation between the (squared)
disturbance and a predictor variable such as income,3•4 as shown in Figure 16.1.
Income of family i, i = 1, ... , 12 is measured along the horizontal axis and the
squared residual values occur along the vertical axis. The apparent dependence of
the squared disturbance on income can be tested by estimating the following model:
A2
U; = YO + YlXli + 8; (16.29)
where
8; = a disturbance term.
• •
•
•
• •
•
• •
•
• •
xu =income, i = I, .. , 12
violation of the assumption). Goldfeld and Quandt considered the problem that the
residuals obtained from estimating one set of parameters from a sample of data are
not independent. Thus, if the 12 observations in the application involving income
as a predictor can be categorized prior to data analysis according to the expected
magnitude of the squared disturbance (i.e. let E(uf) ::::: E(u~) ::::: E(u~) ...), the ratio
u7
L ~=! IL :~?ut does not follow the F -distribution. This is because the numerator
and denominator in the ratio can be shown to be dependent.
Goldfeld and Quandt (1965) proposed a ratio whose numerator and denominator
are independent under the null hypothesis of homoscedasticity by partitioning the
original data as follows (compare (16.9))
where the vectors and matrices with subscript A refer to the first observations !n
and those with B subscripts to the last !n
(where the observations are still ranked
according to the expected magnitude of the squared disturbance under the alternative
hypothesis ofheteroscedasticity). Importantly, the residuals Ui, i = 1, ... , n are ob-
t
tained from fitting separate regressions to the first n and to the last n observations. 5!
The ratio of the residual sums of squares from these regressions:
u}AUjA
(16.31)
U1jBUjB
! !
is F -distributed with n - k and n - k degrees of freedom under the null hypothesis.
A joint test for homoscedasticity and correct model specification is given by White
(1980). 6 For this test, we refer to (16.10), which uses the cross-sectional observations
5. Goldfeld and Quandt ( 1965) also considered a modification of the test by omitting a middle group of
observations.
6. See also Judge eta!. (1985, p. 453).
ESTIMATION AND TESTING 337
a1 (n _ L) u 1u 1 L)RSS1
where
n
RSS1 = residual sum of squares in period t(= LUf 1 ).
i=l
Define:
x; = a matrix of order n x L of cross-sectional observations for the
L predictor variables X it, ... , XLt, in period t, where X it = 1,
'
xjt a row vector of predictor variables of the j -th cross-sectional
observation in period t,
n
v; =
1 "'A2 f
- ~UjrXjtXj 1 ,and
n j=l
1
v, = -a1 X 1 X 1 •
n
A 2 f
If homogeneity exists, Y; and Vr will both be consistent estimators of the same ma-
trix. Under heteroscedasticity these estimators will tend to diverge. The White test
determines the significance of the difference [Yr - Vr ]. The test can be implemented
through the estimation of auxiliary regressions on the residuals ujt of (16.1 0) for each
t:
L L L
u], = LCtttXtjt +L L CiU'tXtjtXt 1 jt + ejt• j = 1, ... ' n (16.33)
t=2 t=2 t'=2
t'?;t
where
e jt = a disturbance term.
The (unadjusted) value of the coefficient of determination R 2 (see Section 16.1.5) is
computed for ( 16.33). The null hypothesis of the White test is:
Under the null hypothesis n R 2 has an asymptotic x2 -distribution with degrees of free-
dom equal to the number of parameters in the auxiliary equation (16.33): L(L2+ l) - 1.
The White test is also a test for functional misspecification as indicated by the
joint nature of the hypothesis. Thus, failure to reject the null hypothesis implies a
lack of evidence for:
In this form the error term u jr I x jr still has the expected value equal to zero, since
E(u jr) = 0 for a given value of x jt· The variance of the error term in (16.35) is:
u jr) 1 1 2 2 2
(16.36)
Var ( - . = 2 Var(Ujr)= 2 -axj 1 =a.
xJI xjr xjr
In this case OLS is best, linear unbiased if it is applied to the new criterion variable
y jr I x jt and the new predictor variable 1I x jr. This transformation involves weighting
each observation by 1I x jr so that as the value for x jr increases, the weight declines.
This estimation method is also known as weighted least squares (WLS), and it is a
special case of the generalized least squares (GLS-) estimation methods. We discuss
these methods in Section 16.3.
7. See, for example, Judge eta!. (1985, pp. 447-448) for the Bartlett test ofheteroscedasticity.
8. We closely follow Wittink (1988, pp. 181-182).
ESTIMATION AND TESTING 339
CORRELATED DISTURBANCES
In contrast to the first two assumptions, the implication of a violation of the third
assumption is controversial. We return to the basic equation ( 16.1) which postulates
the use of time series data for the estimation of parameters separately for cross section
j:
Suppose we assume:
E(uj 1 ) = 0
Ujt = PjUj,t-1 + Bjr. I Pi I< 1.
And also:
E(Bjr) = 0 (16.38)
Cov(BJt• Bjr') = 0, t :/= t'.
In ( 16.3 7) the error terms are not independent but follow a first -order autoregressive
process with parameter p1. This feature is called autoregression, autocorrelation or
serial correlation. The parameter pJ is known as the autocorrelation parameter or
autocorrelation coefficient. Under these assumptions, it is possible to show mathe-
matically that the parameter estimates a J and f3J are unbiased. Indeed, virtually all
econometric textbooks emphasize this point. For real-world problems, however, it
is .very difficult to claim that a model is correctly specified yet somehow the error
terms are autocorrelated. Thus, we take the position that a violation of the third as-
sumption results from model misspecification (e.g. incorrect functional form, omitted
variable(s), varying parameters).
To detect a violation of the assumption that the disturbances for different observations
have zero covariance, one can plot the residuals (uj 1 ) against time. Figures 16.2
and 16.3 show cases of positive and negative autocorrelation, respectively. In Figure
16.2, a positive residual tends to be followed by another positive one, and a negative
residual tends to be followed by a negative one. Positive autocorrelation means that
the residual in t tends to have the same sign as the residual in t - l. On the other hand,
in Figure 16.3 we see that the observations tend to have positive values followed by
negative ones, and vice versa, which is a pattern of negative autocorrelation.
The best-known test statistic to detect (first-error) autocorrelation is the one devel-
oped by Durbin and Watson (1950, 1951). The Durbin-Watson test statistic is based
on the variance of the difference between two successive disturbances:
(16.39)
Residual
Yjt- Yjt
(for given j)
Time(t)
D.W.=
"'n (u . - u. )2
"'n
L-t=2 Jt /r-l
L-t=l u jt
A
(16.40)
The D. W. statistic varies between zero and four. Small values indicate positive auto-
correlation, large values negative autocorrelation.
Durbin and Watson (1950, 1951) formulated lower and upper bounds (dL, du)
for various significance levels, 9 and for specific sample sizes and numbers of param-
eters. The test statistic is used as follows:
The Durbin-Watson test is not very powerful in the sense that the inconclusive range
is quite large. Judge et al. (1985, p. 330) recommend that the upper critical bound
9. Tabulated values of dL, du for different significance levels can be found in, for example: Theil (1971, pp.
721-725), Wittink (1988, pp. 306-307), Greene (1997, pp. 1016-1017).
ESTIMATION AND TESTING 341
Residual
Yjt- Yjt
(for given j)
Time(t)
(du) is used instead of the lower bound. Thus they essentially include the inclusive
region as evidence of autocorrelation. We note that the Durbin-Watson statistic is a
test for first-order autocorrelation, and does not consider higher-order autoregressive
schemes.
In the presence of a lagged criterion variable among the predictor variables, the
D. W. statistic is biased toward a finding of no autocorrelation. For such models Durbin
(1970) proposed a statistic (D) defined as:
(16.41)
Many software programs automatically compute and report the value of the Durbin-
Watson statistic. We note that for cross-sectional observations the autocorrelation
result is meaningless, since its value depends on the (arbitrary) ordering of cross-
sectional observations. We also note that econometric textbooks suggest a remedy
that essentially incorporates the systematic pattern in the residuals in the estimation
method. We believe that this remedy should only be a last-resort approach. That is,
the model builder should first do everything possible to have an acceptable model
specification, etc. This is confirmed by, for example, Mizon (1995). If the model spec-
ification cannot be improved, one may treat autocorrelation similar to the approach
used when the assumption about homoscedasticity is violated (compare (16.35)).
342 CHAPTER 16
Pj = "T A2 •
L...,r=2 uJ,t-I
and
I 0. For other stochastic processes such as moviog average processes and combioed autoregressive moviog
average processes, see, e.g., Judge eta!. (1985, Chapter 8).
ESTIMATION AND TESTING 343
(16.45)
(16.46)
A
Ppj = "T A2
L...t=l u jt
u u
where jt. j,t+p are the OLS-residuals.
For the more general alternative (16.44), to test the null hypothesis Plj = P2j =
... = Ppj = 0 (i.e. for a given cross section j) the LM test statistic T L~=l f3;j has a
xtl) distribution, asymptotically.
In Section 16.3 we consider estimation with AR(l) errors. For the estimation of
higher-order AR processes we refer to Judge et al. (1985, pp. 293-298).
NONNORMAL ERRORS
The fourth assumption, that the disturbances are normally distributed, may also be
violated due to model misspecification. It makes sense, therefore, that this assumption
not be examined until the model specification is reconsidered, if necessary. For the
same reason it is efficient to examine the plausibility of the second and third assump-
tions before one checks the fourth. The disturbances need to be normally distributed
for the standard test statistics for hypothesis testing and confidence intervals to be
applicable. We can examine the validity of the fourth assumption indirectly through
the residuals. If each error term separately satisfies all four assumptions, then the
estimated error values (residuals) as a group will be normally distributed. Thus, if the
residuals appear to be normally distributed, we cannot reject the hypothesis that each
unobservable error term follows a normal distribution.
To examine whether the residuals are approximately normally distributed, we
can categorize the residuals and construct a histogram. Such a histogram shows the
relative frequency with which the (standardized) 11 residuals fall into defined classes.
An inspection of the histogram may suggest deviations from normality in the form of
II. The residuals are standardized by dividing the observed values by the standard deviation of the residuals
(the average residual equals zero under usual conditions).
344 CHAPTER 16
skewness or kurtosis.
A relatively simple test for normality involves the chi-square distribution. This test
compares the observed frequencies for a given set of categories with the frequencies
expected under the null hypothesis of normally distributed error terms. The test is
most powerful if the boundaries of the categories are determined in such a manner that
each category has the same expected frequency. For instance, if we use five classes,
we want to obtain boundaries 12 such that the expected relative frequencies under the
null hypothesis of normality are 20 percent for each class. The larger the sample size,
the larger the number of classes can be for maximum use of the data. The test involves
the following statistic (for a given cross section j):
x2 = L
c 2
(Oc- Ec)
(16.47)
c=l Ec
where
Oc = the observed frequency in category c,
Ec = the expected frequency in category c,
c = the number of categories.
A computed x2 -value is compared with a tabulated value for a specified type I er-
ror probability and (C- I - 1) degrees of freedom. The last degree of freedom is
lost because the standard deviation is estimated from the residuals before the test is
performed (assuming the regression equation contains an intercept, so that the mean
residual is zero).
The literature on testing for normality is vast. Some commonly used tests are the
Kolmogorov-Smimov test, the likelihood ratio test and the Kuiper test. For a descrip-
tion and comparison of these tests, see, for example, Koerts and Abrahamse (1969,
pp. 110-128). Other tests of the normality assumption are the Shapiro-Wilk W-test 13
and the Jarque-Bera test. 14 The Jarque-Bera test statistic for non-normality is also
chi-square distributed (with 2 degrees of freedom):
(T- L) ~2 1 ~2
2
X(2) = a2
A (sk + 4ek ) (16.48)
where
T= number of observations,
L = number of parameters,
a = standard deviation of the residuals (for a given cross section j),
a = skewness of the distribution of the residuals (3rd moment)
12. Z-values, which represent standarized residual values under the null hypothesis.
13. See Shapiro, Wilk (1965).
14. See Bera, Jarque (1981, 1982), Stewart (1991, p. 162).
ESTIMATION AND TESTING 345
Thus, this test determines whether the third and fourth moments of the residuals are
consistent with the null hypothesis of normality. For additional tests of normality see
Judge et al. (1985, pp. 826-827).
For some models the error term may not be normally distributed but there exists
a transformation that produces normally distributed disturbances. Suppose that, if
the effects of predictor variables are accounted for, yit is log-normally distributed.
Then by taking the logs of y jt we have criterion variable In y jr which is normally
distributed (and hence we have normally distributed errors). This is a special case
of a class of transformations considered by Box and Cox (1964). For this class they
assume there exists a value A such that
YJr-
1
--A-= /3JjXljt + ... + /3LjXLjt + 8jt' A> 0 (16.49)
where the disturbance terms e jt are normally distributed (and homoscedastic). It can
be shown that:
Y\
-1
lim - 1- - = lnyjt· (16.50)
1.-+0 A
Apart from a difference in the intercept, A = 1 yields the basic model for the L-variate
case (16.7). In general, however, the Box-Cox transformation is applied primarily to
let the data determine the most appropriate functional form. See Kristensen (1984)
for a marketing application.
We note that with increases in sample sizes, it becomes easier to reject the null
hypothesis of normally distributed errors. Thus, even minor deviations from normality
can signify a violation of the assumption required for traditional statistical inference.
If the model specification seems appropriate, it will be unappealing to follow strict
rules with regard to such violations. To accommodate cases in which the normality
assumption does not hold but the model specification is acceptable, researchers have
developed robust regression methods. The interested reader is referred to Forsythe
(1972), Huber (1973), Talwar (1974), Hinich and Talwar (1975) and Judge et al.
(1985, pp. 828-839). For marketing applications, see Mahajan, Sharma and Wind
(1984).
We reviewed diagnostic tests for individual assumptions about the disturbances of the
classical linear model. Most of the tests apply to one specific assumption. However,
15. The following is based on Kiiriisi, Matyas and Szekely (1992, Chapter 13).
346 CHAPTER16
the tests may require the acceptance of other assumptions. For example, the tests for
autocorrelation assume that the disturbances are homoscedastic. In empirical appli-
cations, various assumptions may be jointly violated. This implies that the diagnostic
value of individual test results is in doubt. An alternative approach is to apply joint
tests for multiple assumptions. For example, the White test, discussed earlier, is a
test for the joint investigation of homoscedasticity and correct model specification.
We can also consider the RESET test a joint test. This test is based on the idea that
various misspecifications can lead to a violation of the assumption that the expected
value of the disturbances is zero. Possible misspecifications include omitted variables,
the wrong functional form, dependence between the regressors and the disturbances,
etc. In this sense, the RESET test is also a joint test.
Finally, we note that the Jarque-Bera test is a joint (LM) test for:
1. functional form;
2. heteroscedasticity;
3. autocorrelation, and
4. normality.
Based on the Jarque-Bera test, a whole family ofLM tests can be derived, which can
jointly test for two, three or all four assumptions.
Such joint tests are often used in empirical research. All these tests are available in
the software program PCGIVE. 16
It is often convenient to consider the error-term assumptions under the condition that
the predictor variables Xtj, l = 1, ... , L are nonstochastic or ''fixed". Essentially
this means that we would take the observed values for the predictor variable(s) as
given. However, if a predictor variable is in fact dependent upon other variables, the
condition that Xej is ''fixed" is untenable. We therefore use:
See also ( 16.21 ). The assumption is violated (i.e. E (u jt I Xtjt) ::ft 0) if Xtjt has mea-
surement error or if Xtjt in fact depends on Yjt· The consequence of a violation of
this assumption is that the ordinary least squares parameter estimates are biased. The
remedy lies in the use of simultaneous equation estimation methods (if Xtjt depends
on y jt) or in the use of instrumental variables (if Xejt has measurement error). We
return to these issues in Section 16.4.
MULTICOLLINEARITY
In model ( 16.9) we assume that the matrix of observations X j has full rank, 17 or that
the vectors in X j are linearly independent. If this assumption is violated we encounter
perfect collinearity. If Xj has a rank smaller than L, X}Xj has a determinant with
value zero. In that case (XjX j )- 1 does not exist and the parameter estimates cannot
be uniquely determined. Perfect collinearity is, however, unusual. If it happens, it is
often due to a duplicate variable. The notion of "almost" linearly dependent vectors
is, however, meaningful. This means X} X j does not have a zero determinant, but its
value will be near zero. Hence (XjXj)- 1 exists but its elements will be large. As a
consequence the parameter estimates will have large variances and covariances, as
shown in Section 16.1.5. Thus multicollinearity (almost linearly dependent vectors in
the X j -matrix) makes the parameter estimates unreliable.
Real-world data often show high degrees of correlation between predictors. Take,
for example, the specification of a model to explain the demand for a product category
at the household level. Variables such as household income and age of the head of the
household may be relevant predictor variables. However, these variables tend to be
correlated. In particular, income is partly a function of age. Also, while in this case
it may be fine to assume that the variable "Age" is "fixed", it is more appropriate to
let "Income" be a random variable. This dependency of income on age is unlikely
to require novel estimation techniques (the collinearity should be modest). However,
the interpretation of statistical results may become more complex. Specifically, one
should consider both the direct effect of a change in "Age" and its indirect effect
through "Income" on demand. The substantive interpretation of the estimated effects
will then benefit from an explicit consideration of dependencies between predictor
variables.
Collinearity between predictor variables can also result from decisions made by
managers. Suppose, we argue that unit brand sales is a function of advertising and
price. It seems inappropriate to assume that either variable is ''fixed". Instead, a
decision maker manipulates both variables, based on strategic considerations, mar-
ketplace conditions, cost considerations, etc. Only under restrictive assumptions can
we assume that such predictors are truly exogenous. But the focus here is on a pos-
sible dependency between these variables. Imagine that consumers' price sensitivity
depends on the amount of advertising. 18 A manager who manipulates both variables
will then refrain from changing these variables independently. In that case, a real-
world correlation between these marketing decision variables reflects, at least partly,
the decision maker's belief about interactions between these variables. If her belief
is very strong (i.e. price changes are strongly related to advertising changes), then it
can become difficult or impossible to obtain reliable estimates of the separate main
effects from marketplace data. And, verification of an interaction effect will then be
especially difficult.
Some of the procedures for detecting the presence of multicollinearity are (for a
given cross section j):
We discuss other procedures in Section 16.1.5. In that section we also discuss several
approaches for the resolution of severe multicollinearity.
If the error-term and other model assumptions are satisfied, we can evaluate the qual-
ity of the model, and identify substantive implications. In this section we discuss
criteria pertaining to the model's goodness of fit and the (statistical) reliability of the
coefficients. After that, we continue the discussion of the interrelations between the
predictor variables.
GOODNESS OF FIT
A criterion by which we can measure a model's overall goodness of fit, that is, the
degree to which fluctuations in the criterion variable are explained by the model, is
the coefficient of determination or R 2 . 19 It measures the proportion of total variance
in the criterion variable "explained" by the model. With ( 16. 7) as the model, let
(16.52)
and
(16.53)
R], the coefficient of determination for a given cross section, is defined as: 20
2 I:;{=l (Yjr- :h) 2 explained variation in Yj
(16.54)
Rj = L,'{= 1(Yjr- Yj)2 = total variation in Yj
The explained variation in y j is also known as the regression sum of squares. The
total sum of squares in y j is referred to as the total variation in y j. Expression ( 16.54)
can also be written as:
1. d . . .
R~ = 1_ "T A2
L-t=l u jr =1_ unexp ame vanation my j
(16.55)
1 L,'{= 1(Yjr- Yj)2 total variation in Yj
The unexplained variation is also called residual variation, error sum of squares, or
residual sum of squares.
Expression (16.55) can be written in matrix notation as:
Af A
U·Uj
R~ = 1 - -1- (16.56)
1 y'!''y'!'
1 1
where
uj = aT x I column vector of the T residuals,
yj aT x 1 column vector of the observations of the criterion
variable in deviations from the mean. Thus,
Yj*' = (Yij - Yj •... ' YTj- Yj).
1. how well the regression line fits the data as measured by variation in the residuals,
and
2. the amount of dispersion in the values of the criterion variable.
It is tempting for researchers to regard estimated equations with high R2 values fa-
vorably and low R 2 values unfavorably. Indeed, it is straightforward to agree with
this notion if everything else remains the same. However, in practice, models that are
structurally deficient (e.g. with implausible substantive implications) can have higher
R 2 values than models with acceptable specifications. In addition, for some types of
problems and data, all models necessarily have low R 2 values. Thus, low R 2 values
should not be interpreted as indicating unacceptable or useless results nor should high
R 2 values be interpreted to mean that the results are useful.
It is also worth noting that a high R 2 value may obtain at the cost of a high degree
of positive autocorrelation. 21 In addition, the use of many predictor variables may
20. If the model does not contain a constant term it is not meaningful to express the observed values in
yj
deviations from the mean, in which case the denominator should read E "{= 1 1 (Theil1971, Chapter 4).
21. For an explanation of this phenomenon, see Koerts and Abrahamse (1969, Chapter 8).
350 CHAPTER 16
result in artificially high R 2 values. Each predictor with a nonzero slope coefficient
makes a contribution to R2 , with the actual contribution determined by the slope and
the amount of variation for the predictor in the sample. 22 Also model comparisons
based on R 2 values are not meaningful unless the comparisons are made on one set of
data for the criterion variable (and the number of predictors is taken into account). 23
The artificiality of R 2 is further illustrated by examples that indicate how re-
searchers can manipulate the value of R 2 while the standard deviation of the residuals
remains constant (see Wittink, 1988, pp. 209-213 for examples). The standard devia-
tion of residuals is an absolute measure of lack of fit. It is measured in the same units
as the criterion variable, and therefore it has substantive relevance.
If the criterion variable differs between two equations, either in terms of its def-
inition or in terms of the data, comparisons based on R2 values are meaningless.
Models of the same criterion variable on the same data, that differ in the number of
predictor variables, can be compared on R~, the adjusted coefficient of determination.
In this coefficient both the unexplained variation and the total variation are adjusted
for degrees of freedom, (T- L) and (T- 1) respectively. The adjusted coefficient of
determination for a given cross section j is defined as:
2 Q~QJ/(T- L)
R = 1- . (16.57)
aj y":'y*/(T-1)
J J
From (16.56) and (16.57) we can show that R] and R~j are related as follows:
2 2
Raj=RJ- T-L [L-1] (1-RJ).
2 (16.58)
It follows that R~j < R] (except for the irrelevant case of L = 1).
R 2 values have some role to play in the model-building process, we need other criteria
to determine the substantive usefulness of a model.
To illustrate the difficulty associated with the use of relative measures for sub-
stansive questions, we consider the following. As indicated, R 2 measures the percent
of variation in the criterion variable explained by the model in the sample. A re-
lated measure determines the marginal contribution of each predictor variable to
the model's explanatory power. In a simplified setting, imagine that the predictor
variables in (16.7) are uncorrelated. In that case (for a given cross section j):
L
R2 = L:r;,xe (16.59)
£=2
where
r y.xe = the simple correlation coefficient of y and xe.
For uncorrelated predictor variables, we can also show that:
2
2 A2 sxe
ry,xe = f3e -;'2 (16.60)
y
where
Sxe the standard deviation in the sample for xe.
sy = the standard deviation in the sample for y.
Thus, (16.60) provides a measure of the contribution for each predictor variable
to the model's explanatory power. Some software packages automatically provide
standardized regression coefficients, defined as:
Sxe
beta= f3e-. (16.61)
A
Sy
The interest in beta coefficients stems from the difficulty model users have in com-
paring slope coefficients. Each slope coefficient, ~e. is interpretable only with respect
to the unit in which predictor variable eis measured. With different units of measure-
ment for the predictor variables, the slope coefficients are not comparable. The beta
coefficients are unitless, as are the correlation coefficients.
A comparison of (16.61) with (16.60) and (16.59) makes it clear that the beta
coefficients have similar properties as R 2 • Thus, the shortcomings associated with
R2 with respect to substantive interpretations also apply to the beta coefficients. In
fact, if the model user wants to make pronouncements about the "relative impor-
tances" of predictor variables, based on the beta coefficients, several other conditions
must be met. Since the amount of sample variation in xe plays a role in (16.61),
this variation must be obtained through probabilistic processes. This means that the
"importances" of predictor variables manipulated by managers, such as price and
advertising, cannnot be meaningfully assessed in this manner.
352 CHAPTER 16
More specifically, all slope parameters are zero under the null hypothesis, i.e.:
Since R] is the percent of variation explained for cross section j, and (l - RJ) is the
percent unexplained, it is easy to show that an alternative expression for this F -ratio
is:
RJ/(L- l)
(l _ R~)/(T _ L) "" F(L-l),(T-L)· (16.64)
J
In both expressions (16.63) and (16.64) the number of slope parameters in the model
equals (L- 1) which is also the number of degrees of freedom used to explain
variation in the criterion variable. The number of degrees of freedom left for the
unexplained variation is (T- L), since one additional degree of freedom is used by
the intercept. If the calculated F -value exceeds the critical value, we reject Ho. Only
if the equation as a whole explains more than what could be due to chance does it
make sense to determine the statistical reliability of individual slope coefficients.
Strictly speaking the F -test is valid only if all error-term assumptions are met. In
practice, it is very difficult to specify a model that is complete and has no identifiable
shortcomings. Typically, the residuals that remain after the model is first estimated
will be reviewed and tested. Based on this residual analysis the model will be ad-
justed, and a second model will be estimated. This process often proceeds until the
model builder has a specification that is theoretically acceptable and is consistent
with the data. Such an iterative model-building process, while necessary, reduces
the validity of statistical inferences. For this reason, and also for other reasons, it
is important for the model builder to "test" the ultimate specification on a validation
sample (see Chapter 18).
the OLS estimator PtJ of f3t} also follows a normal distribution. It then follows that
(Pti - f3e}) /aPti is standard normally distributed. The standard deviation a Pti is a
function of a1, the standard deviation of the disturbance term. Since the latter has to
u
be estimated from the residuals, we replace a Pti by its estimate Pti , so that the test
statistic:
PeJ - f3eJ
(16.65)
u·/Jej
is t-distributed with (T - L) degrees of freedom. 26 Usually the null hypothesis is
formulated as follows:
Ho : f3e} = 0
and Ho is rejected if the calculated t-value for a given coefficient exceeds the tabu-
lated value for the t-distribution. Most software programs provide p-values which,
for a two-tailed test, show the probability of a type I error if the null hypothesis is
rejected.
As the amount of empirical evidence with regard to demand sensitivity to specific
marketing instruments accumulates, it seems inappropriate to continue the use of hy-
potheses of no effect. Farley, Lehmann and Sawyer (1995) argue that model builders
should instead rely on average estimated effects from previous empirical analyses.
Such average effects have been reported in various meta-analyses. New empirical
results for problems studied earlier should then be tested against prevailing average
effects.
Suppose that the current average effect for predictor variable XtJ is c. Then the
null hypothesis is:
Ho : f3eJ =c.
If we cannot reject this null hypothesis, the product/time/region for which data have
been obtained is comparable to previous sources with regard to the average effect of
variable .e. If we obtain an effect that is significantly different from the prevailing
standard, we have an opportunity to consider the reason(s) for divergence. Farley
et al., argue that this manner of testing will prove to be more informative than the
traditional procedure.
To illustrate the traditional procedure, consider relation ( 15.1 ), for which T = 29,
L = 4,P2 = 7.9(theestimatedeffectofadvertisingshareonmarketshare)andup2 =
3.2. Suppose we test whether P2 is significantly different from zero at the five percent
level of significance. From (16.65) we obtain P2/frp = 7.9/3.2 = 2.47. The table of
the t-distribution shows a critical t-value at the five percent level, for a one-tailed test
(it is unlikely that more advertising reduces market share) with 29- 4 = 25 degrees
26. For more detail see introductory statistics and econometrics textbooks, such as Wonnacott and Wonnacott
(1969, 1970), Wittink (1988), Greene (1997). Tables of the t-distribution are also reproduced in most of these
textbooks.
ESTIMATION AND TESTING 355
of freedom, of 1.71. In this example we reject the null hypothesis that advertising
share has no effect on market share.
In the context of an iterative model-building process, the result showing statistical
significance for the advertising effect indicates that we want to retain the advertising
variable in the model. If the slope had been insignificant, and we had expected the
variable to be relevant, we would consider possible reasons for the nonsignificance
in the sample. For example, the true functional form may differ from the one we
assume. One might argue that this is not a likely explanation if we first examined and
tested the residuals (and found no such evidence against the error-term assumptions).
Thus, if the assumed functional form is incorrect the residuals should so inform us.
However, residual tests are not conclusive. Specifically the power of any of the tests
may be low. For that reason, we still consider the possibility that lack of statistical
significance for an individual slope coefficient may be due to model misspecification.
In general, there are many possible reasons why the t-ratio for a given slope
coefficient can be insignificant:
1. the predictor variable has an effect that is different from the functional form
assumed (incorrect functional form);
2. the model excludes other relevant predictor variables (omitted variables);
3. the predictor variable is highly correlated with one or more other predictor vari-
ables included in the model (multicollinearity);
4. the sample data are insufficient (lack of power);
5. the predictor variable has no relation with the criterion variable (irrelevance).
Insignificance due to either of the first two reasons should stimulate us to investigate
the existence of superior functional forms and/or additional predictor variables. The
third reason requires the exploration of additional data sources, model reformulation,
or alternative estimation procedures (as might be the case for the fourth reason).
Only the fifth reason is proper justification for eliminating a predictor variable from
the model. We must consider each of these possible reasons before we eliminate a
predictor variable from the model.
We note that in a simple regression analysis (one predictor variable), the F -test
statistic relevant to (16.62a) and the t-test statistic for (16.65) when fJej = 0 provide
the same conclusionP Curiously, in a multiple regression analysis it is possible to
reject the null hypothesis that all slope parameters are zero (16.62a) and at the same
time find that none of the slope coefficients differ significantly from zero based on
the t-test (16.65). This may occurif the predictor variables are highly intercorrelated.
In this case the separate influences of Xej> l = 2, ... , L on Yj are unreliable (large
estimated standard errors), while the model's explanatory power may be high. 28 Mar-
keting data, especially at aggregate levels, often have a high degree of collinearity
between the predicto:f variables. We pursue the issue of collinearity in more detail
27. This is true if both tests are conducted at the same level of significance and against the same alternative (i.e.
a two-tailed t-test).
28. For an example, see Leeftang (1974, pp. 141-143).
356 CHAPTER 16
Ho : fh = f3s = 0
is tested with the statistic:
(16.66)
where
R}= the unadjusted R 2 for the full model,
R~ = the unadjusted R 2 for the restricted model,
DFR = the number of degrees of freedom left for
the restricted model,
DFF = the number of degrees of freedom left for
the full model.
In this example:
4. Multicollinearity
We mentioned four procedures for detecting the presence of multicollinearity in Sec-
tion 16.1.4. We now discuss additional procedures and measures for the detection of
multicollinearity.
regressions, one at a time, can be used to quantify the overlap among any number
of predictor variables.
A related measure is the variance inflation factor (VIF) computed as (I -
~;1 ) -l. A VIF greater than I 0 is often taken to signal that collinearity is a problem
(Marquardt, 1970, Mason, Perreault, 1991 ).
d. The correlation matrix
Often empirical researchers examine the matrix of all bivariate correlation coef-
ficients between the predictor variables. Based on some cutoff value a decision
might be made about which pairs of predictors should not be included together.
One problem with this procedure is that all bivariate correlations may be low and
yet one predictor may be highly related to a linear combination of the remaining
predictors (see the previous approach). Another problem is that the severity of a
high bivariate correlation between two predictors depends on the sample size. For
that reason statistical tests of multicollinearity29 are meaningless. In this regard,
Mason and Perreault (1991) demonstrated that the harmful effects of collinear
predictions are often exaggerated and that collinearity cannot be viewed in isola-
tion. They argue that the effects of a given level of collinearity must be evaluated
in conjunction with the sample size, the R 2 value of the estimated equation, and
the magnitudes of the slope coefficients. For example, bivariate correlations as
high as 0.95 have little effect on the ability to recover the true parameters if the
sample size is 250 and the R 2 is at least 0.75. By contrast, a bivariate correlation
of 0.95 in conjunction with a sample size of 30 and an R 2 of 0.25 results in
failure to detect the significance of individual predictors. Currently, the spectral
decomposition of X) X j is advocated for the quantification of multicollinearity,
based on one or more characteristic roots. 30
Solutions to multicollinearity31
If any two predictors are perfectly correlated, the parameters of a regression equation
cannot be estimated. Thus no solution to a multiple regression problem (other than
combining the two predictors or deleting one) can be obtained if there is extreme mul-
ticollinearity. The approaches available for the resolution of non-extreme but severe
multicollinearity include:
The last approach is simply that a predictor variable with a statistically insignificant
t -ratio be eliminated. In general, an insignificant t -ratio indicates that the predictor
29. See Farrar, Glauber (1967), Kumar, (1975), Friedmann (1982).
30. See Belsley, Kuh, Welsh (1980), Judge et al. (1985, pp. 902-904), Belsley (1991).
31. We closely follow Wittink ( 1988, pp. 100-10 I).
ESTIMATION AND TESTING 359
variable is irrelevant or that it has an effect but we cannot obtain a reliable parameter
estimate. The elimination of such a predictor variable from the model should be a
last-resort option.
A better solution, of course, is to obtain more data. With more data, especially
data with a reduced degree of multicollinearity, it is more likely that a significant
effect can be obtained for any predictor variable. However, the opportunity to add
data to the sample is often limited.
Another possible solution is to reformulate the model. In some sense, the elimi-
nation of one predictor variable amounts to a model reformulation. But we may also
combine two predictor variables and in that manner resolve the problem. For example,
this is appropriate if the two variables are substitute measures of the same underlying
construct such as when the observations represent characteristics of individuals, and
two of the predictors measure age and work experience. These two variables tend to
be correlated, because both may measure individuals' learned skills for a certain job
as well as their maturity. It may be sufficient to use one of these variables or to define
a new variable that combines these two predictor variables.
If the source of multicollinearity is two or more predictor variables which capture
different phenomena, such combinations are not appropriate. However, it may be
possible to redefine the variables. Consider, for example, a demand model in which
product category sales in t (Q 1 ) is explained by predictors such as total disposable
income in t (lnc1 ) and population size in t (N1 ). These predictors are collinear because
lnc 1 is a function of the size of the population. However, by defining the variables on
a per capita basis, such as Qr/ N 1 and /ncr/ Nr, we create a simpler3 2 model and
eliminate the collinearity between I nc1 and N 1 •
Other functional specifications of the mode1 33 can also reduce multicollinearity.
For example, consider relation (16.7). The predictor variables X2Jt• ... , XLJt vary
over time, 34 and each of these variables may have a component that follows a com-
mon trend. In that case, bivariate correlation between the predictors will be high
due to the common factor. The multicollinearity can be considerably reduced in the
following manner. 35 First specify ( 16. 7) for t - 1:
Note that the reformulated model has no intercept. Apart from that, the model con-
tains the same parameters. Importantly, this reformulation will not have the same
degree of multicollinearity as the original model. Such opportunities for model refor-
mulation are useful to consider when severe multicollinearity is encountered.
32. See also our discussion in Section 7.2.1.
33. Compare Chapter 5.
34. See also Rao, Wind, DeSatbo ( 1988).
35. We assume that XJjt represents the constant term.
360 CHAPTER 16
The argument in favor of these alternative estimation methods is that we may accept
a small amount of bias in the parameter estimates in order to gain efficiency, i.e. to
reduce the variance of the slope coefficients. One procedure that does just that is
ridge regression. 37 Marketing applications of ridge regression include Mahajan, Jain,
Bergier ( 1977) and Erickson (1981 ).
The principle underlying equity estimation 38 is that recovery of the relative contri-
bution of correlated predictor variables can be improved by first transforming the vari-
ables into their closest orthonormal counterparts. Rangaswamy and Krishnamurthi
( 1991) compared the performances of equity, ridge, OLS, and principal components
estimators in estimating response functions. Overall, "equity" outperfonns the three
estimators on criteria such as estimated bias, variance and face validity of the es-
timates. Similar results were obtained in a simulation performed by the same re-
searchers. 39
othetwise deficient model may still be accurate, since most of the explanatory power
of X2j is contained in one or more other predictor variables. As long as this correlation
between Xzj and the other predictors continues to exist, the forecasts from this defi-
cient model will not be systematically affected. On the other hand, if this correlation
changes, the accuracy of the forecasts will be affected by the model's deficiency.
(This happens especially when the predictor variables are controlled by a manager
or a policy maker.) The reason is that the elimination of a relevant predictor variable
biases the coefficients of the remaining predictor variables which affects the accuracy
of conditional forecasts. The nature and magnitude of this bias depends, among other
things, on the degree of multicollinearity. If the degree of multicollinearity changes
from the estimation sample to new data, then the nature of the bias changes, thereby
affecting the forecasting accuracy. Thus, for conditional forecasting multicollinearity
is a serious problem, and the "solution" has to be similar to the situation when the
primary objective of model building is to understand relationships.
If the objective is to understand or describe the relationship between a criterion
variable and several predictor variables, it is critical that all relevant variables are
included in the model. This is also true for normative decisions. The presence of
multicollinearity may make it difficult or impossible to obtain the desired degree of
reliability (i.e., low standard errors) for the coefficients. As a result the computed
t-ratio for the slope coefficient of one or more predictor variables may be below
the critical value. Yet for understanding it is not advisable to eliminate a (relevant)
predictor. In this case, ridge regression or equity estimation should allow one to obtain
slightly biased but reliable estimates for a fully specified model.
Section 16.1 focuses on estimation and testing issues relevant to models parameter-
ized with either time-series data or cross-sectional data. For example, (16.7) is spec-
ified for time-series data while ( 16.1 0) is for cross-sectional data. In this section we
discuss issues associated with the opportunity to pool time-series and cross-sectional
data for estimation (use both sources of variation), and we discuss estimation methods
for pooled data.
In matrix notation, we have considered the use of time-series data separately for
each cross-sectional unit, such as for brand j, specified in (I 6.9):
(16.69)
If we have T observations for each cross section, and pool the data, parameters are
estimated from n T observations. Under the usual assumptions about the error terms,
we can test the null hypothesis: f3 j = {J;, i, j = 1, ... , n, i :I j. This test is based
on a comparison of the residual sum of squares when each cross section has unique
parameters, as in ( 16.69), with the residual sum of squares when a single parameter
40. This section is based on Bass, Wittink (1975), Wittink (1977) and Leeflang, van Duijn (1982a, 1982b).
362 CHAPTER 16
vector is estimated. The latter estimates are obtained from the pooled model:
(16.70)
or
Y = Xf3 + u (16.71)
where
Y = anT x 1 column vector of observations on the
criterion variable,
X = a n T x L matrix of observations on the L predictor
variables, and a first column of ones,
f3 a L x 1 column vector of (homogeneous) response
parameters, and
u = a n T x 1 column vector of disturbance terms.
We note that in the case where the cross sections represent brands, it is highly unlikely
that ( 16. 70) is preferred over ( 16.69). If the cross-sectional units refer to geographic
regions or stores, it is plausible that parameters are equal (homogeneous) across the
units.
Algebraically, ( 16.69) with a separate intercept term X!jt = 1 for each j = 1, ... , n
can be stated as:
L
Yjt=f3ij+Lf3fjXfjt+Ujt• t=1, ... ,T, j=1, ... ,n. (16.72)
f=2
Thus, (16.72) allows both the intercepts and the slope parameters to be unique for
each cross section, as does (16.69).
Pooling offers certain advantages (see also the discussion in Section 14.2). How-
ever, if the homogeneity hypothesis is rejected then the estimates based on the pooled
model (16.70) lack meaning (at best the homogeneous estimates are weighted aver-
ages). Nevertheless if the differences in parameters between the cross sections are
small we may be willing to accept some bias in order to obtain reduced variances. 41
Even if the cross sections differ in parameter vectors, the statistical uncertainty about
the separate estimates may "inflate" the differences considerably. Thus, the model
builder faces the question how to trade bias (separate estimation minimizes the bias)
off against variance (pooled estimation minimizes the variance).
41. To this end Wallace ( 1972) suggested weaker criteria. See also Brobst, Gates ( 1977).
ESTIMATION AND TESTING 363
Within the option to pool the data we distinguish four alternative estimation
methods:
I. OLS (Ordinary Least Squares): we assume that all parameters are fixed and com-
mon for all cross-sectional units. This is pooling under complete homogeneity:
see (16.70).
2. OLSDV (Ordinary Least Squares with Dummy Variables): we assume the slope
parameters are fixed and common for all cross-sectional units, but the intercepts
are unique for each cross section (contraining f3n = f3e2 = ... = f3en. for l =
2, ... , Lin (16.72)):
L
YJr = f3tJ + Lf3exe}r +uJr, t = 1, ... , T, j = 1, ... ,n. (16.73)
l=2
3. VCP (Variance Components Pooling): under this method the slope parameters are
fixed and common, but the intercepts vary randomly between the cross sections
(see below).
4. RCR (Random Coefficient Regression): under this method the slope parameters
as well as the intercepts are random variables, i.e. all parameters are random or
unsystematic around a mean (Swamy, 1970, 1971 ).
An other option is to pool the data for some variables ("partial pooling": Bemmaor,
Franses, Kippers, 1999) or over some cross sections which are "homogenous" ("fuzzy
pooling": Ramaswamy, DeSarbo, Reibstein, Robinson, 1993).
There are only a few known applications ofRCR in marketing. 42 An appealing
extension is to expand RCR so that the cross sections can differ systematically and
randomly in the parameters. In this manner Wittink ( 1977) allows for some parameter
heterogeneity but to a much smaller extent than occurs if each cross section has
unique and fixed parameters, as assumed in (16.72). However, the estimation is not
always feasible. 43
The disturbance term ejt has two components: a random intercept pertaining to cross
section j (Vj) and a component Wjr representing the random influence of cross
42. See Bass, Wittink (1975), Wittink ( 1977).
43. See Leeftang, van Duijn (1982b).
364 CHAPTER 16
section j and time period t. Under the following assumptions (see Maddala, 1971 ):
E(Bjt) = E(Vj) = E(Wjt) = 0
Cov(v;, Wj 1 ) = 0, for all i, j = 1, ... , n, t = 1, ... , T
Cov(v;, Vj) =
0, i=f=j
(16.76)
Cov(w; 1 , Wjr') = 0, i =!= j, and/or t =!= t'
Var(Vj) = aJ'
Var(wj 1 ) =a~
Q = a2 (16.77)
0 0 A
where
A = a matrix of order T x T,
0 a T x T matrix of zeros.
p p
p 1 p
A
p p
=
and p aJ' j(aJ' +a~), 0 < p < 1.
Pooling under complete homogeneity, and assuming that the intercept is zero,
results in a special case ofVCP for which p = 0 (i.e. aJ' = 0). The estimates from this
special case are known as "total estimates" (since both cross-sectional and time-series
variation is used). If a~ ~ 0, p ~ 1, we obtain the OLSDV estimates of the slope
coefficients. OLSDV ignores the cross-sectional variation for parametric estimation,
and the results are referred to as "within estimates". The VCP estimator attempts to
strike a balance between these extreme approaches by allowing the data to determine
the value of p.
For the estimation of aJ and a~, Nerlove (1971) proposes a two-step procedure.
The first step is based on the OLSDV-estimates:
(16.79)
We now return to the hypothesis of parameter homogeneity: {3; = f3 j for all i =/: j,
where {3;, {3 j are vectors of parameters as in ( 16.69). The classical test is often referred
to as the Chow test. 44 The Chow-test, is an F -test with degrees of freedom VI and v2:
(RSS,- RSS2)/v1
Fv 1 ,v2 (16.80)
RSS2/v2
where
RSS, = residual sum of squares of the pooled regression
(16.70),
RSS2 = the sum of the residual sum of squares from
the separate regressions (16.69),
VJ = difference in degrees of freedom between the
pooled regression and the separate regressions,
V2 = total degrees of freedom unused from the separate
regressions.
If the null hypothesis of homogeneity is not rejected, pooling the observations is
statistically justified. For (16.80) the degrees of freedom are:
VI = (nT- L)- n(T- L) = (n- 1)L (16.81)
and
v2 = n(T- L). (16.82)
If pooling all data and assuming homogeneity of all parameters is rejected, we may
want to compare the estimation of separate parameter vectors (16.72) with OLSDV
(16.73).
A comparison of OLSDV against the estimation of separate parameter vectors
results in the following degrees of freedom for the numerator in (16.80):
v, = (nT- n- L + 1)- n(T- L) = (n- l)(L- 1) (16.83)
In this case, RSS, is the residual sutn of squares ofOLSDV (16.73).
Table 16.3 Parameter estimates for the model of regional lager beer purchases.
We show the application of pooling methods and tests on beer category data in the
Netherlands. 45 The market consists of five regions. In each region, demand is a func-
tion of marketing instruments and environmental variables defined for the region. Im-
portantly, there exist large differences in beer drinking behavior between the regions.
The criterion variable is:
qj 1 = liter~ 0.28 gallon) oflager beer purchased per capita
liters (1
(residents 15 years and older) x 10,000 in region j, period t.
Wittink (1977) allowed for systematic and random variation in parameters estimated
for a market share model representing one brand. For region j the model is:
L
Yit = Lf3tjXtjr+Ujr, j=l, ... ,n, t=l, ... ,T (16.84)
i=l
where
yit the criterion variable for region j, time period t,
Xtjt the £-th predictor variable for region j, time
period t.
Systematic and random differences in the parameters are accommodated through the
specification:
K
LYkZtJk+vej, j=1, ... ,n, l=1, ... ,L (16.85)
k=l
where
Zijk the (average) value for the k-th predictor variable
in region j (to explain cross-sectional variation in
parameter f3eJ ),
Vej = a disturbance term.
In an application, the differences between unique parameter estimates (from applying
OLS separately to each region) and the corresponding estimates from (16.85) were
quite large. One advantage of accommodating systematic differences through ( 16.85)
is that it produces less variation in parameter estimates across the regions than ex-
ists in the separately estimated regressions. Another advantage is that one obtains a
(possible) explanation for differences in parameters across the regions. For example,
Wittink allowed the own-brand price elasticity to depend on advertising for the brand
(specifically, the price elasticity was more negative in regions in which the brand had
a higher advertising share).
ESTIMATION AND TESTING 369
Y = X/3 +u
where all variables are defined as in Section 16.2. Thus we have nT = M observa-
tions of time series and cross section data, and we assume that f3 is a vector of homo-
geneous response parameters. The variance-covariance matrix of the disturbances is
now defined as:
E(uu') = n (16.86)
where
which is a positive definite symmetric M x M matrix with full rank M, and where
the Wij are the covariances of the disturbances. We can obtain an expression for the
generalized least squares estimator of f3 in (16.71) as follows:
E[(Vu)(Vu)'] = a 2 /. (16.90)
This shows that if we transform the variables by the V -matrix in (16.88), the dis-
turbance terms satisfy the error-term assumptions contained in (16.22) and (16.23).
47. Since n. and thus Q*, is symmetric and positive definite, so are n- 1 and n•- 1, and hence a matrix V
satisfying ( 16.87) exists. See Theil ( 1971, p. 238).
370 CHAPTERI6
Thus, the OLS-method could be applied to (16.88). The Generalized Least Squares
estimator48 is:
(16.91)
since Q = a 2 Q*. This model and estimation method are "generalized" because other
models can be obtained as special cases. The ordinary least squares estimator is one
such special case in which Q = a 2 /. We discuss other special cases below. The
variance-covariance matrix of the GLS-estimator is: fi
(16.93)
n
If Q is unknown, as it is in empirical work, we replace Q by and use an Estimated
Generalized Least Squares (EGLS) estimator (also called Feasible Generalized Least
Squares (FGLS) estimator). This estimator is usually a two-stage estimator. In the
first stage, the ordinary least squares estimates are used to define residuals, and these
residuals are used to estimate Q. This estimate of Q is used in the second stage to
obtain the EGLS estimator denoted by 49 J.
We now consider one special case in which the disturbances are heteroscedastic,
often encountered when cross-sectional data are used. Assuming the heteroscedastic-
ity is such that each cross section has a unique disturbance variance, we have:
ali
IJ
0 .. .
[ 0 af I .. .
(16.94)
E(uu') Q= . .
The a j can be estimated by taking the square root of the unbiased estimator of the
variance of the OLS-residuals:
T
L u]rf(T- L) for each j = 1, ... ' n. (16.96)
t=l
We note that in (16.94) and hence in (16.96) it is assumed that the variances are
constant within each cross-sectional unit j = 1, ... , n.
We refer to (16.34) to emphasize that there are other opportunities for the accom-
modation of heteroscedasticity. In ( 16.34) the variance of the disturbance increases
with the squared value of a predictor variable. 5° Prais and Houthakker (1955) sug-
gested a variance proportional to the squared expected value of the criterion variable.
To obtain estimated variances they divided the sample in m classes, and computed the
squared average value of the criterion variable in each class. 5 1
A second special case of GLS is typical for time-series data. In this case, the covari-
ances, Cov(u jt, u jt' ), t i= t' differ from zero (but we assume that the disturbances
are homoscedastic). We consider the case that the disturbances are generated by a
first-order autoregressive scheme, also called a first-order stationary (Markov) scheme
(compare (16.37)):
where the e jt are independent normally distributed random variables with mean zero,
and variance equal to aij = ai. We also assume e jr to be independent of u j,t-1· For
simplicity let Pj = p (we relax this assumption below). By successive substitution
for u j,t-1, u j,t-2 in (16.97) we obtain:
(16.98)
After multiplying both sides of(16.98) by u j,t-s and taking expectations, we have:
E(u jr. u j,r-s) = Ps E(u j,t-s• u j,r-s) +Ps-I E(e j,t-s-1, u j,r-s) + ...
+p2 E(ej,r-2. u j,r-s) + pE(ej,r-1. u j,t-s) + E(ejr. Uj,t-s)
= psa; (16.99)
since the 8jt are independent of Uj,t-1 and Ujt has variance a;. 52 We also assume
that Cov(Ujr. uur) = 0 for j i= i, i, j = 1, ... ' n.
The variance-covariance matrix Q now has the following form:
PO ...
OP O]
... O
E(uu)
I
= Q =a
2
[bb ~ (16.100)
50. See Judge et al. ( 1985, pp. 439·441) for a more general expression.
51. See Kmenta (1971, pp. 256-264) for other possibilities.
2 a2
52. It can be shown that au = (J-~2).
372 CHAPTER 16
~
pT-1 ]
T-2
where P =aT x T matrix, P =[
PT-1
. ~ ond
P = "n "T A2
(16.101)
A
where the ujt are the OLS-residuals. Then by substituting p for p in (16.1 00), we
obtain P and n,
and we use the EGLS estimator.
We now relax the assumption that p j is p, since it is likely that the autocorrelation
parameter differs over cross sectional units. Thus:
0 ...
p2 ... 0
0] (16.103)
0 Pn
,_,]
[I
where
Pj p2
J ... PL2
Pj 1 Pj ... Pj
Pj
: T-1 T-2 T-3
Pj Pj Pj 00 0 1
To demonstrate (again) 54 that one can also apply OLS to suitably transformed vari-
ables, in the presence of autocorrelation, we return to (16. 7):
and
We now consider the model structure ( 16.71 ), but allow the disturbances to be simul-
taneously cross-sectionally heteroscedastic and time-wise autoregressive. We assume
that Cov(u Jt• uit) = 0, for j i= i and for all t (but relax this assumption below). The
variance-covariance matrix can then be written as:
UfPl 0 ...
0 a'f P2 ...
E(uu') = rl= [ . . (16.108)
0 0
which is anT x nT matrix, where Pj, j = 1, ... , n is defined in (16.103) and 0 is
a T x T matrix of zeros.
To obtain estimates of the parameters in (16.108), we proceed in several steps.56
First OLS is applied to all nT observations in (16.71) from which residuals it are u
obtained. We then estimate the autocorrelation parameters and incorporate these es-
timates in a manner such as outlined above. Next we estimate the model that now
accounts for (first-order) autocorrelated disturbances, and estimate the error vari-
ance57 for each cross section. In this case the relation between the estimated variance
of the autocorrelated disturbance u Jt and the variance of the error term s Jt ( 16.97) is:
2
A2 Uej
au·= --2. (16.109)
} I - Pj
55. See also Cochrane and Orcutt (1949}.
56. See, for example, Kmenta (1971, pp. 510-511}.
57. See Theil and Schweitzer (1961} for an example.
374 CHAPTER 16
Next we relax the assumption that the disturbances are independent between the
cross sections, i.e. the assumption that Cov(u jt, u;r) = 0 for i :fo j. We also allow
the parameter vectors associated with the predictor variables to be heterogeneous:
[ ~: J [ ~ f, gJ[ ~ J+ [ :: J or (16.110a)
Yn 0 0 Xn f3n Un
y = Zf3 +u (16.110b)
where
y = anT x 1 vector,
z = a n T x Ln matrix,
f3 a Ln x 1 vector,
u = anT x 1 vector.
We assume
0 2!
I
[ 0211
... 0Inl
... 02nl
J
E(uu') = Q = : (16.114)
On) I a!; I
and
(16.115)
58. Because the n sets of equations in (16.110a) do not seem to be related, one refers to this structure as
"seemingly unrelated regressions". See Zellner (1962).
ESTIMATION AND TESTING 375
Zellner ( 1962) proposed the following estimation procedure.59 First estimate (16.110b)
by OLS. Then estimate the elements of n from the OLS residuals:
.... r A A
an1Pn1 a;Pnn
where
59. See Zellner (1962), Kmenta (1971, pp. 517-519). See also Leeflang (1974, pp. 124-127) and Leeflang
(1977b).
60. See Clarke (1973). For an asymmetric, nonhierarchical market share model, Carpenter, Cooper, Hanssens
a
and Midgley (1988) use the ji values lo identifY potential cross-effects.
376 CHAPTER 16
Pj PjT-I ]
T-2
1 Pj
Pij [ p; i,j=1, ... ,n.
;_, T-2
P; P;
u
In sum-constrained models LJ=! Jt = 0 for each t = 1, 2, ... , T. In that case there
is by definition contemporaneous correlation of the disturbances in these models. As
a consequence, however, the contemporaneous variance-covariance matrix ( 16.114)
is singular, because the elements of each row total zero:
n
"a~
~
._
]I -
"n "T ~ ~
L....d=! L....t=l U jtUit _
T - L -
"T ~ "n ~
L....t=l U }t L....i=! Uit
T - L
O
= . (16.118)
i=l
To avoid singularity, one equation is deleted. If the matrix n is known, the resulting
parameter estimates are invariant to which equation (which cross-sectional unit) is
deleted. 61 When the variance-covariance matrix is unknown, as usually is the case, the
parameter estimates depend on which equation is deleted. 62 However, this problem
can be resolved by imposing restrictions on the contemporaneous variance-covariance
matrix. 63 We note that robustness may generally be lost if GLS estimation methods
are used; see Leeflang, Reuyl (1979).
• a single equation;
• multiple equations (such as in the case of multiple brands);
• simultaneous equations.
So far we have focused on single- and multiple-equation models. Simultaneous equa-
tions represent a special case of multiple equations. We use a system of simultaneous
equations if more than one equation is needed to properly estimate relations. In such
a system there are multiple endogenous variables which are the variables to be ex-
plained by the equations. The remaining variables are predetermined, consisting of
exogenous and (potentially) lagged endogenous variables. Exogenous variables are
taken as given, and are similar to the predictor variables used in a single-equation
model.
The concept of simultaneity refers to the idea that the endogenous variables are
"explained" jointly and simultaneously by the predetermined variables and the dis-
turbances. Importantly, an endogenous variable may be used both to explain other
61. See McGuire, Farley, Lucas, Ring (1968). See also Hanssens, Parsons, Schultz (1990, p. 89).
62. See Reuyl (1982), Leeflang, Reuyl (1983, 1984b) and Gaver, Horsky, Narasimhan (1988).
63. See McGuire eta!. (1968), De Boer, Harkema (1983), Leeflang, Reuyl (1983), De Boer, Harkema, Soede
(1996). See also Bultez, Naert (1973, 1975) and Nakanishi, Cooper (1974).
ESTIMATION AND TESTING 377
The need for special estimation methods for simultaneous equations derives espe-
cially from the violation of assumption 5 for the disturbances (Section 16.1.3) which
states there is independence between the predictors (Xtjt) and the disturbances (u jt ).
Thus, in essence the presence of an endogenous variable which has both an explana-
tory role and which is to be explained by the system is the reason for the violation of
assumption 5.
We emphasize that there are other reasons why assumption 5 may be violated. For
example, if a predictor variable is measured with error, we can analytically express
the measured variable as a function of its true values and an error component. Having
access only to the measured values for that variable for empirical analysis means that
the measurement error gets included in the disturbance term in the equation. It is easy
to demonstrate that under this condition the predictor variable is not independent
of the disturbances. Thus, measurement error in a predictor variable also violates
assumption 5.
a1 = 0.1 R 1 + u 1 (16.119)
where
a1 = brand advertising expenditures in $000,
R1 = brand revenues in $000, and
u1 = an error term.
Simultaneously, the manager believes that advertising affects unit sales in the same
period (we ignore for convenience the possibility oflagged effects). For simplicity of
exposition, we assume a linear demand function:
(I6.120)
378 CHAPTER 16
qt
(Unit sales)
Advertising budget:
Dt = O.Ipqt
(fJo+fJJPl
(Advertising) a1
where
q1 = unit sales (in thousands of pounds),
p1 = price per unit in dollars, and
v1 = an error term.
We have two equations, and since R1 = q1 x p, we also have two endogenous vari-
ables. One fundamental question is whether a1 is stochastically independent of v1 in
(16.120). By substituting (16.120) into (16.119) we obtain:
qr
(Unit sales)
(/3o + fh PI)
<Po +PI pz)
......................
---
(Advertising) ar
Figure 16.5 Relations between demand and advertising for different prices PI and P2· where
P2 > PI· fJj < 0.
demand function, the estimated relation would be based on the intersections between
the budget and demand equations. From Figure 16.5 we deduce that the estimated
function would have a higher slope than the demand function based on advertising.
We note that in pmctice this set of equations will not exist. The advertising budget
may well depend on revenue, but not in the restrictive sense posited here. And the
demand function should accommodate nonlinear- and interactive effects for price and
advertising.
recursive equations. For example, we could imagine a brand manager with a de-
mand equation such as (16.120) but an advertising budget that does not depend on
current brand revenues. The adequacy of single-equation estimation procedures is
then determined by the lack of correlation between the disturbances across these two
equations.
When r is neither diagonal nor triangular (as is the case in the example above),
we need to use special estimation methods. The simplest and perhaps most popular
one is the two-stage least squares method 2SLS. This method is a special case of the
method of instrumental variables (IV) introduced in Section 16.1.4. IV was devel-
oped around the following "estimation strategy" (Greene, 1997, p. 288). Suppose that
in the classical model (16. 7) the variables Xtjr. £ = 1, ... , L, are correlated with u jt.
And imagine that there exists a set of variables Zejt• £ = 1, ... , L', where L' :::= L, 64
such that the Zejt are correlated with X£jt but not with u jt. Then we can construct a
consistent estimator fJj = (fJtj, ... , fJLj) based on the assumed relationships among
xejt, Ztjt and u jt· In this manner the set of original regressors is replaced by a new
set of regressors, known as instrumental variables, which are correlated with the
stochastic regressors but uncorrelated with the disturbances.
A simple example of an instrumental variable is the variable lir which replaces
qt in ( 16.120). To show how lit (and Gt) can be obtained, we first change ( 16.119) as
follows:
(16.123)
These two equations show how, given the original equations (16.120) and (16.123),
the endogenous variables can be expressed in terms of exogenous variables only. The
resulting equations, ( 16.124) and ( 16.125), are reduced-form equations.
The IV method would use Gt and li 1 , obtained from the simple, linear regres-
sions at = f(Pt) and qt = g(pt ), and use these variables in (16.120) and (16.123) as
follows:
(16.126)
(16.127)
(16.128)
(16.129)
It is easy to see that it is impossible to obtain five unique parameter estimates (ao, a,,
f3o, fh, fh) from four coefficients (at, a2, b,, b2). Interestingly, we can obtain&, =
a2/b2. However, our primary interest is in obtaining good estimates of the demand
equation, and the underidentification problem is located there. This problem is evident
from the order condition for identifiability:
For (16.120) there is one endogenous variable at the right-hand side, but the only
exogenous variable in the system is included. Thus, given that no exogenous variable
is excluded from that equation, the order condition is not satisfied, and (16.120) is not
identifiable. On the other hand, for (16.123) the order condition is satisfied: there
is one (right-hand side) endogenous variable (q 1), and one exogenous variable is
excluded from that equation (p 1 ).
The order condition is necessary but not sufficient for identifiability. Sufficiency
requires that the rank condition be satisfied. If there is exactly one solution for the
parameters in (16.120) and (16.123) then the rank condition is satisfied. 65 From these
conditions it should be clear that (16.123) needs an exogenous variable, and one
different from p 1 • Thus, to make (16.120) identifiable, we need to modify (16.123).
Of course, this modification needs to be relevant to the problem. For example, if
advertising is indeed only a function of demand in the same period, then there is
no opportunity to obtain unique parameter estimates with desirable statistical prop-
erties for (16.120). On the other hand, if a1 partly depends on a,_,, we can expand
( 16.123), and make (16.120) identifiable. 66 For examples of simultaneous-equation
marketing models, estimated by 2SLS, see Farley, Leavitt ( 1968), Bass ( 1969a), Bass
and Parsons (1969), Bass (1971), Cowling, Cubbin (1971), Parsons and Bass (1971),
Lambin, Naert and Bultez (1975), Lambin (1976), Albach (1979), Plat and Leeflang
(1988).
We note that when model building in marketing was based on aggregated data,
the simultaneity between sales and advertising was much more serious than it is
today. For example, the advertising budget may be determined on an annual basis:
as in (16.119) advertising expenditures may be restricted to 10 percent of revenues.
However, if weekly data are available for the estimation of a demand equation, this
restriction - the advertising budget equation - may have only a minor impact on the
properties of the least-squares estimator for the demand equation. With the increasing
availability of disaggregated data the advertising budget equation is virtually irrele-
vant. For example, scanner data allow model builders to estimate demand functions
for individual items (such as a specific package size of a particular variety for a brand)
from weekly data for individual stores. The advertising budget decision, instead, is
often made at the brand level on an annual basis for a region covering many stores.
One of the assumptions of the 2SLS-method is that the disturbances of the dif-
ferent equations of the system are independent. We know from the discussion about
seemingly unrelated regressions in Section 16.3 that parameter estimates of the sys-
tem are inefficient if we do not account for contemporaneous correlations of the
disturbances. An estimator that makes use of the cross-equation correlations of the
disturbances is more efficient. The techniques that are generally used for joint estima-
tion of the simultaneous system of relations include: three-stage least squares (3SLS),
maximum likelihood (Section 16.6) and GMM (Generalized Method of Moments).
Thus the 3SLS method includes the application of GLS to the system of structural
relations.
Briefly, 3SLS works as follows. The estimator of the elements of the variance-
covariance matrix of the disturbances n are obtained by first applying 2SLS on the
set of simultaneous equations. The 2SLS residuals are then used to compute Q which
allows 3SLS estimates to be obtained with Q and the"newregressors" Z1jr ••.. , ZL'jt·
It is also possible to iterate the 3SLS estimation method: I3SLS. For applications, see
Schultz (1971), Houston (1977), Lancaster (1984), Carpenter (1987) and Tellis and
Fomell (1988). 67 Carpenter studied competitive marketing strategies consisting of
product quality levels, promotional expenditures and prices. Using a simultaneous
equation model he examined the interrelations between these marketing instruments.
For example he let prices and promotional spending be signals of product quality and
allowed promotional spending to influence prices. A model of interrelations between
marketing instruments requires the use of simultaneous equation estimation methods.
Carpenter used 3SLS (and 2SLS) on a cross section of business-level PIMS data. 68
Tellis and Fomell ( 1988) used PIMS data to estimate the relation between advertising
and product quality over the product life cycle.
I3SLS was employed by Lancaster (1984) to explore the relation between brand
advertising and industry, brand and rival retail sales and market share. One might
wonder whether the use of simultaneous equation methods truly generates estimates
that differ much from, say, OLS. Although there are examples of OLS estimates that
are almost identical to the 2SLS/3SLS-ones, 69 the differences between estimates can
be very large (Greene, 1997, p. 294, p. 760).
To conclude, we return to the IV method. We mentioned that the 2SLS- and 3SLS
estimation methods are special cases of IV. To show this, we return to the system of
relations (16.70):
Y = Xf3 + u.
The IV method is used when the assumption E(X'u) = 0 is violated. The matrix X
may be substituted by a matrix Z such that E(Z'u) = 0. Thus, every column of the
new matrix Z is uncorrelated with u, and every linear combination of the columns of
Z is uncorrelated with u. If Z has the same number of predictor variables as X, the
IV estimator is: 70
p= cz'x)- 1Z'Y. (16.130)
Several options to choose an acceptable matrix Z can be found in the literature. 71 One
approach is to choose L linear combinations of the columns of Z. Of all the different
linear combinations of Z that we might choose, X is the most efficient:
X= Z(Z'Z)- 1Z'X (16.131)
which is a projection of the columns of X in the column space of Z. In this case Z
can have (many) more variables than X. With this use of instrumental variables, X
derived from Z, we have
p= (X'X)-l X'Y (16.132)
= [X'Z(Z'Z)- 1Z'Xr 1 X'Z(Z'Z)- 1 Z'Y.
It can be shown72 that this expression can also be written as:
tJJ = lnf3o.
One potential problem with this transformation is that a1 may be zero for some values
oft. To avoid this, we could replace a1 by (1 + a1 ):
lnq1 = fJJ + f3tln(l + a + s
1) 1• (16.136)
Adding a value of one is an arbitrary choice. The question is whether the results are
sensitive to which value is added. To address this issue, N aert and Weverbergh ( 1977)
added a parameter y to a1, obtaining:
lnq1 = fJJ + f3tln(y +a + s1) 1• (16.137)
Now (16.137) is intrinsically nonlinear in y (but for a given value of y it again
becomes linear). One possibility is to apply a trial-and-error or grid search on y.
For simplicity assume that for any value of y, (16.137) is estimated by OLS,
under the usual assumptions about the disturbance term. Then choose m values for
y, covering a plausibly wide range, and choose the value of y for which the model's
R 2 value is maximized. We illustrate the idea by showing a hypothetical plot of R 2
=
values associated with 8 different y 's (i.e. m 8). In Figure 16.6 this plot shows that
the maximum R2 value74 lies between the values Y4 and Y6, perhaps close to Y5· For
74. The value of y maximizing R2 is a maximum likelihood estimate. See Goldfeld and Quandt ( 1972, pp.
57-58}.
ESTIMATION AND TESTING 385
•
• •
•
• •
•
•
greater accuracy, the grid search can be continued by choosing another set of values
for y between Y4 and Y6, until an acceptable level of accuracy is achieved.
Naert and Weverbergh ( 1977) faced such a problem with 24 observations. Advertising
expenditures were measured in thousands of dollars, and two of the observations had
zero entries. Based on a grid search, they obtained an optimal value of y = 114.75 for
which R2 = 0. 771, and P1
= 0.60. Interestingly, a very different result is obtained for
y = 1 at which value R 2 = 0.49 and P1
= 0.12. We show R 2 values for other values
of y in Table 16.5. This table shows that the model fit is extremely sensitive to the
value of y, especially to values below y = 50. Of course, we cannot claim that the
optimal value of y, chosen through this grid search, provides the best estimate of the
advertising elasticity. If we used a different specification, for example with additional
predictor variables, the optimal value of y may change, as may the sensitivity of R 2
to y. We provide another example of a grid search in Section 17.4 where we discuss
a varying parameter model.
The Box-Cox transformation, introduced in Section 16.1.4 is often used as a
method of generalizing the linear model. This transformation is also used to linearize
nonlinear relations by performing a grid search on transformations of the predictor
variables (Greene, 1997, p. 480).
Grid search procedures are costly and inefficient, especially if a model is nonlin-
ear in several of its parameters. Nonetheless relatively efficient methods have been
devised. Fletcher ( 1980) and Quandt (1983) provide further details. An alternative
procedure is linearized regression, 75 which consists of a sequence of linear approxi-
mations. Let:
(16.138)
y R2
0.005 0.24
O.ol 0.26
0.05 0.32
0.10 0.35
0.50 0.44
1 0.49
5 0.62
10 0.68
50 0.76
100 0.77
150 0.77
200 0.77
where Yj, Xj, {3j and Uj are defined in (16.9) and f is a nonlinear function of Xj
and f3i· First, initial values are given to the estimates Pi,
i.e. Pio.1 6 Next f(Xj, {3j)
is approximated by a first-order Taylor expansion about Pjo:
Setting (a8f.) .
P(j /Jtj=/JtjO
equal to fe~, (16.138) can be written as: 77
(16.140)
(16.141)
Letting
(16.142)
we obtain:
L
Yjo = " Ao
L....' fetf3fj + u,j (16.143)
f=l
an expression linear in /3ej and thus amenable to estimation by OLS or other suitable
estimation methods giving us a new vector of estimates ~ j = ~ j 1. The procedure can
now be repeated by considering a Taylor expansion about ~jl· The process continues
until two subsequent vectors of estimates ~j,m and ~j,m+l are equal, that is, until the
proc.ess converges.
Other nonlinear estimation procedures derive from the formulation of a least
squares objective function as a nonlinear programming problem. The objective is to
u
minimize the sum of the squared residuals where ~j and j have to satisfy (16.138)
for all elements of~ j and j: u
minimize (16.144)
subject to: Yj- f(Xj. f]j)- Uj = 0.
For further reading and additional references see Fletcher, Powell (1963), Wilde,
Beightler (1967), Amemiya (1985, Chapter 8), Judge et al. (1985, Appendix B, pp.
951-979), Greene (1997, Chapter 10, pp. 450-493).
Applications of nonlinear estimation methods appear in Naert, Bultez (1975),
Gopalakrishna, Chatterjee (1992) and Parker (1992). We briefly discuss the com-
munications response model of Gopalakrishna, Chatterjee (1992) who propose the
following structure:
It is clear from (16.145) thaU is the (average) part of an account's share the finn under
study carries over from the previous period. The other systematic part is attributable
to the current period's communication strength relative to that of competitors. In the
absence of information on competitors' communication efforts (see also Chapter 15),
o is used as a constant (over time and across accounts) index. The model formula-
tion forces the relationship between the criterion variable and the relative amount of
communication to be concave. That is, holding competitors' efforts constant, which
o,
is implicit in the constancy of increases in cu produce diminishing returns. We note
that 8 cannot be estimated independently of the parameters in ( 16.146). Therefore, o
is arbitrarily set equal to one.
The objective of the model is to help an industrial marketing manager assess the
joint impact of advertising and personal selling effort on the share of account i 's po-
tential, i.e. the share of a given customer's business at the level of individual accounts.
Although the parameters in (16.145) and (16.146) do not show this, Gopalakrishna
and Chatteijee did allow for heterogeneity in the parameters .A, a1, az, and a3.
Data were available for a subset of the accounts at a finn selling electric cables,
and covered 138 accounts for seven quarters. Of the accounts, IS were classified as
large (potential ~ $1 million}, 68 small(~ $100,000}, and 55 medium. However,
no statistical evidence obtained to support parameter heterogeneity between these
three segments. Due to the intrinsic nonlinearity of effects in the proposed structure,
a nonlinear least squares estimation procedure (SYSNLIN; see the SASIETS User's
Guide, version 5, 1984) was used for parameter estimation. The lagged variable in
(16.145) had an estimated effect of about 0.18 which suggests that the (average)
carryover is modest. Both main effects and the interaction effect for advertising and
personal selling were positive. Thus, there is a positive interaction between adverti-
sing and personal selling expenditures on the share of an account's business potential.
However, due to the concavity in the effect of Cir on acsu, the response curves for
personal selling and advertising actually converge as advertising increases (see Figure
1 on p. 193 in Gopalakrishna and Chatteijee, 1992).
The authors also determine optimal communication plans for the finn. They do
this separately for each segment (with segment-specific parameters, in the absence of
statistical evidence to support this heterogeneity). Their optimization results suggest
that profit can be increased by some 50 percent, if advertising is increased moderately
(about 20 percent) and personal selling is increased dramatically for large accounts,
strongly for medium accounts, and decreased to about zero for small accounts. In
the single-period optimization, advertising expenditures would increase from about
$140,000 to about $180,000. Sales force expenditures in total would increase from
about $80,000 to about $200,000.
In this case, the optimal relative expenditures are about the same for adverti-
sing and sales force, which is consistent with the parameter estimates, cit = 0.009
and az = 1.28. When advertising is adjusted on a per-account basis, i.e. 0.009 X
138 accounts :;;, 1.28. We note that the optimal results are obtained under the unreal-
istic assumption that competitors will not modify their communication expenditures.
ESTIMATION AND TESTING 389
There are different ways to obtain estimators. In the preceding sections we con-
centrated on the least squares method. In this section we discuss other methods to
obtain estimators, viz. maximum likelihood ML methods. After an introduction of
ML estimation methods, we discuss the large sample properties of the estimators. We
then give some examples and applications. We end by summarizing some well-known
statistical tests based on likelihoods.
In the social sciences, data are used to test hypotheses and to aid in theory con-
struction. The principle of maximum likelihood, due to Fisher (1950), provides a
statistical framework for assessing the information available in the data. 78 The prin-
ciple of maximum likelihood is based on distributional assumptions about the data.
Suppose that we have i random variables with observations y;, (i =
1, ... , N) such
as purchase frequencies for a sample of N subjects. A probability density function for
Yi is formulated, denoted here as: f(y; I 8), where{} is a parameter characterizing the
distribution (we assume{} to be a scalar for convenience). This formulation holds for
both discrete {taking a countable number of values, possibly infinite) and continuous
random variables {taking on an infinite number of values on the real line). Discrete
random variables are for example 0/1 choices, or purchase frequencies, while market
shares and some ratings on scales can be considered continuous random variables.
Important characteristics of these random variables are their expectations and vari-
ances. In the purchase frequency example usually a (discrete) Poisson distribution is
assumed (see Section 12.1):
e-J..AYi
f(y; I A)=--. (16.147)
YI•·'
The expectation of the Poisson variable in (16.147) can be shown to be E(y;) =A,
and its variance is Var(y;) = A.
One of the well-known continuous distributions is the exponential distribution:
The mean and variance of the exponential random variable in (16.148) are E[y;] =
1I Aand Var(y;) =
1I A2 . This distribution is frequently used forinterpurchase times.7 9
The functions in (16.147) and {16.148) are known as the probability function (pf)
and probability density function (pdf), respectively. Both the normal and the Poisson
distributions are members of the exponential family, which is a general family of
distributions that encompasses both discrete and continuous distributions. The expo-
nential family is a very useful class of distributions. The common properties of the
distributions in this class facilitate the simultaneous study of these distributions.
The expression for the likelihood is considerably simplified if its natural logarithm is
taken, in which case the product in ( 16.149) is replaced by a sum:
N
1(8) = 2)nf(y; I 8). (16.150)
i=l
This expression is especially simple for members of the exponential family (Chap-
ter 12). Since the natural logarithm is a monotone function, maximizing the log-
likelihood ( 16.150) yields the same estimates as maximizing the likelihood ( 16.149).
In the Poisson example, the log-likelihood takes a simple form, the Poisson dis-
tribution being in the exponential family, and is:
N
l(J..) = L(-A + y;ln(J..) -ln(y;!)). (16.151)
i=l
The ML estimator of A is obtained by setting the derivatives of the log-likelihood
equal to zero:
a/(J..)
-aJ..- =
~N
aL...i=l(-A + y;ln(A) -ln(y;!))
= -N +
L y·J..
N
-1
= 0• (16.152)
aJ.. l
i=l
Solving ( 16.152) provides the maximum likelihood estimator (MLE):
N
j_ = ~ Yi (16.153)
~N
t=l
which equals the sample mean.
Similarly, in the example of the exponential distribution, the log-likelihood is:
N
/(J..) = L(ln(J..)- J..y;). (16.154)
i=l
ESTIMATION AND TESTING 391
25 7
2 22 1 7
3 31 3 18
4 21 2 12
5 24 6
6 26 1
7 20 3 31
8 27 2 15
9 24 5 57
10 14 4 29
Taking the derivative of ( 16.154) with respect to }., yields the estimator:
, N
A.= --;-;N~- (16.155)
Li=IYi
the inverse of the sample mean.
Gupta (1991) estimated the exponential model on interpurchase times from scan-
ner data on regular coffee. In a sample of 100 households he obtained). = 0.376 (note
that the situation is slightly more complicated because there are multiple interpur-
chase times for each subject, and N=l526). A graph of the Poisson and exponential
log-likelihoods against}., yields a concave function, with a unique maximum, indicat-
ing that the MLE is unique. This property holds for all members of the exponential
family.
16.6.2 EXAMPLE
For more complex models, the likelihood equations do not yield closed-form expres-
sions. Assume we have an explanatory variable, household size, x;, with values shown
in the third column in Table 16.6. Again, we generate the dependent variable, the
number of sales through random draws from a Poisson distribution, with the mean
parameterized as A.; = exp(1.5 + 0.5 *x;), thus a Poisson regression model (see Sec-
tion 12.1.5) with J1. = 1.5 and fl = 0.5. Here we have an example of two parameters
instead of a scalar. The data generated in this way are given in the last column of
392 CHAPTER 16
r J.L f3 g log-likelihood
Table 16.6. In this case, there are no closed-form solutions for the two parameters. We
therefore apply a numerical search procedure: Newton's algorithm, which maximizes
the log-likelihood numerically80 . Suppose we collect the parameters in the vector
() = (JL, {3)'. Then given a set of starting values, in iteration r the parameter vector is
found by:
1
=
A A A
Both H(Br-1) and S(Br-1) are evaluated at the previous estimate Br-1· Table 16.7
shows the iteration process. The algorithm converged in r = 6 iterations; as a starting
guess of the parameters we take the zero value. The algorithm is said to converge if
the first derivative of the log-likelihood (16.152) changes less than w- 5 . Table 16.7
shows that the ML estimates are close to the true parameter values.
The maximum likelihood approach derives much of its attractiveness from its large
sample properties. Small sample properties of the MLE are usually not known, except
when the criterion variable y;, given the predictor variables' values, is normally dis-
tributed. These asymptotic properties are obtained if the sample size tends to infinity:
N --+ oo. Under fairly general conditions, maximum likelihood estimators (MLE):
1. are consistent;
80. See Scales (1985), Eliason (1993)
ESTIMATION AND TESTING 393
This property states that the MLE tends to the true value in probability for large
samples.
Second, the ML approach yields asymptotically efficient estimators. An estimator
is efficient if it has the lowest possible variance among all estimators in a particular
class, and thus has the highest precision. The (asymptotic) variance of the estimator is
defined as follows. First define the (matrix of) second derivatives of the log likelihood:
(16.158)
This is the Hessian, and is a measure of the amount of information supplied by the
data on the parameter(s). Taking the expectation of the Hessian yields the expected
information: the average information over all possible samples: /(8) = E(H(8)).
This information measure is known as the (Fisher) information matrix. For the lin-
ear regression model with normally distributed errors: the expected and observed
information matrices coincide and H(8) = 1(8) = X'X/8 2 • Inverting the expected
information yields the asymptotic variance of the estimator: AVar = r 1(8). How-
ever, one may prefer to use the inverse of the observed information, H- 1(8), since
that is closer to the observations actually made. 81 The latter is particularly advanta-
geous in smaller samples. Asymptotically the observed and expected value are the
same. They are indentical for some parameterizations of distributions in the exponen-
tial family.
r
In the Poisson example above, the asymptotic variance (AVi:tr) of the estimator can be
computed by inverting the expectation of(l6.158) and using (16.152):
= 'A/N
In this case the observed information is identical to ( 16.159}, since
N
H-lce) = -c-~-2 I:>i)-1 = -c-~-2N~)-1 = ~/N. (16.160)
i=i
In the synthetic data example for the Poisson distribution provided in the previous
section, we obtain the following Hessian matrix:
H = [ 17.90 47.50
47.50 134.93
J. (16.161)
Inverting the Hessian, and taking the square root of the diagonal elements of the
resulting matrix gives us the asymptotic standard errors (ASE) of the estimates. These
are shown in Table 16.8. However, the conditions for the asymptotic approximations
are unlikely to be valid in this example, since it is based on only 10 observations. It
serves as a illustration of the computations.
(16.162)
The above asymptotic properties of the likelihood hold under certain regularity condi-
tions (Lindsey, 1996, p. 187). Although a full discussion of these regularity conditions
is beyond the scope of this section, the following aspects may be useful in practice.
The log-likelihood function is said to be regular if in an open neighborhood of the
true parameter value, it can be approximated by a quadratic function. Such an ap-
proximation breaks down in situations where the true value lies on the boundary of
the parameter space so that the quadratic approximation is inappropriate, or when the
number of parameters to be estimated increases with the number of observations. The
latter situation occurs if in our Poisson example a parameter exists for each individual
i, i.e. Ai for i = 1 ... N.
In this subsection we introduce some useful inferential tools for ML estimation. With
e
the asymptotic distribution of defined in ( 16.162), we have asymptotically:
z=
e-e
~-+ N(O, 1) ( 16.163)
yAVar(e)
In Table 16.8 above, the t-values based on the asymptotic standard errors of the
estimates are provided for the synthetic data example. The t-values show that the
null hypotheses that the mean and regression parameters are zero is strongly rejected.
Again, given the small sample size the asymptotic approximations will not be accu-
rate in this example. However, since the data were actually generated with non-zero
parameter values, the results of the t-tests are consistent with our knowledge about
the structure of the data.
Another frequently used test is the Likelihood Ratio (LR) test. A more detailed dis-
cussion of this test is given in Section 18.4.2. The LR test is used to investigate two
models that are nested and chooses that model that has the highest likelihood of
occurrence, given the observed data. The null hypothesis is again Ho : e =eo. Two
models are estimated that yield log-likelihood values of l(B) and l(Bo), respectively,
where we assume the latter model to be more restricted, for example because one or
more parameters are set to zero. Due to the fact that the two models are nested, minus
twice the difference in their log-likelihood values is asymptotically distributed as x2
under the null hypothesis:
(16.165)
where df is the difference in the number of parameters in e and eo. The LR test
necessitates the estimation of the two models, and is thus computationally more inten-
396 CHAPTER16
sive than the Wald test. For the one parameter Poisson distribution example provided
above, the LR test reduces to:
N
LR = -2N(>..- 'Ao)- 2ln('Ao/'A) LYi· (16.166)
i=l
If we re-estimate the model for the two-parameter Poisson regression synthetic data in
Table 16.6, butrestricting,B = 0, weobtainalog-likelihoodofl(Oo) = -44.68 (New-
ton's algorithm converged in 6 iterations to a parameter value of J1.. = 2.89). Thus in
this case the LR statistic for testing the models with and without (see Table 16.7) the
restriction equals LR = -2(-44.68 +29.07) = 31.22, which is highly significant at
one degree of freedom (the difference in the number of parameter values for the two
models). This is of course expected since these synthetic data were generated on the
basis of a nonzero value of ,8. As a cautionary note we mention that the asymptotic
x2 -distribution for the Wald- andLR tests are unlikely to hold given the small sample
size.
If the models to be compared are not nested, the LR test does not apply. Informa-
tion criteria are then commonly used to identify the most appropriate model. These
criteria such as Akaike's Information Criterion, AIC (Akaike 1974),the Schwartz
Criterion (Schwartz, 1978), and the Consistent Akaike's Information Criterion, CAlC
(Bozdogan, 1987) are discussed in Section 18.4.3.
16.7.1 INTRODUCTION
The regression models discussed so far are parametric regression models. These
models impose specific functional forms through which the criterion variable is ex-
plained by the predictor variables. If the functional forms are correctly specified,
then the parameter estimates have desirable properties. A model builder is usually,
however, uncertain about the shape of the relations. In that case, it is useful to con-
sider more flexible approaches, such as nonparametric regression models and semi-
parametric regression models. Nonparametric regression models do not impose any
functional relation between the criterion variable and the predictor variables (i.e., they
lack parameters). Semiparametric regression models are partly parametric and partly
nonparametric.
To provide a perspective on these alternative models, we first summarize the ad-
vantages and disadvantages of parametric regression models. Next, we do the same
for nonparametric regression models, and we discuss two marketing applications.
Finally, we introduce semiparametric regression models, and we motivate and discuss
a marketing application of such a model as well.
ESTIMATION AND TESTING 397
We note that we discuss only non- and semiparametric regression models. This
means that the criterion variable has at least interval-scale measurement. Thus, we do
not present non- and semiparametric models with criterion variables that have binary
or nominal scales, such as is the case in brand choice models. 83
In the nonparametric regression approach, one relates the criterion variable to pre-
dictor variables without reference to a specific form. We can represent nonparametric
regression models as:
83. See, for example, the semiparametric brand choice model by Abe (1995) and the non- and semiparametric
brand choice models by Briesch, Chintagunta, and Matzkin ( 1997).
398 CHAPTER 16
The function m(x1) contains no parameters. It has the L-dimensional vector x 1 with
predictors as its argument.
There are multiple nonparametric estimators for m(x1) in (16.167). The three
major types are: 84
1. kernel estimators;
2. k-nearest neighbor estimators; and
3. spline regression estimators.
Hiirdle (1990, p. 81) compares the three estimators in a simulation study and con-
cludes that the kernel estimator, the k-nearest neighbor estimator and the spline re-
gression estimator result in the same overall shape of the response function. We focus
on the kernel estimator because it is a widely used nonparametric regression estima-
tor. Before we discuss details of the kernel estimator, we provide a short introduction
to the other two nonparametric estimators.
The k-nearest neighbor estimator uses "nearest neighbor" observations to esti-
mate the criterion variable for given values of the predictor variables. Specifically,
it creates an estimate of the criterion variable for specified values of the predictor
variables (say: xo) on the observations in the data set that are closest to xo. Hence, the
method searches among the observations X!, ... , xr and identifies the k observations
that have the shortest (Euclidian) distance to xo. The value of the criterion variable
given xo is estimated by taking the unweighted average of the y-values for these k
observations.
The spline regression estimator represents m(x1 ) by connecting multiple cubic
polynomials. The polynomials are connected at observation points x; in such a way
that the first two derivatives of mare continuous (Hiirdle 1990, p. 57). Kalyanam and
Shively ( 1998) use a stochastic spline regression, which is a special variant of the
spline regression estimator, for flexible estimation of the relation between sales and
price. They approximate this response function by a piecewise linear function that
has derivatives obtained by drawings from a normal distribution.
The intuition behind the kernel method (Nadaraya, 1964 and Watson, 1964) is
that it computes a local weighted average of the criterion variable y given the values
of the predictors xo:
T
m(xo) =L w,(xo)Yt (16.168)
t=l
where w1 (xo) represents the weight assigned to the t-th observation y1 in the estima-
tion of y for xo. This weight depends on "the distance" of x1 from the point xo, which
is described by
(16.169)
(16.170)
To implement the kernel estimator, one has to choose the kernel function and the
bandwidth parameter. Generally, the kernel function is a symmetric function around
zero, it reaches its maximum at zero, and it integrates to one. A common choice for
the kernel is the normal (Gaussian) kernel:
(-(x 1 - xo) 2)
K (- - xo)
xr- 1
- = --exp . (16.171)
h -/2ir 2h2
This kernel represents the density function of a normal distribution. The closer x, 0 /t
is to zero, the larger K (.) is, i.e. the larger the weight for observation y1 in the
computation of the estimate of y for xo.
The bandwidth parameter selection is essential in kernel regression (and more
critical than the choice of the kernel function). This parameter controls the peakedness
of the kernel function. The smaller it is, the more peaked the kernel function is, and the
more weight is put on the nearest observations. To illustrate, the bandwidth parameter
in ( 16.171) can be interpreted as the standard deviation of a normal distribution. The
smaller the standard deviation of a normal distribution, the smaller the width of the
normal density function.
As the bandwidth decreases, the response curve is based on fewer observations at
a given point. As a result, the response curve potentially becomes squigglier. This
means that the bias in the shape of the curve is reduced at the cost of increased
variance. A bandwidth parameter of (almost) zero leads to a response curve that
connects the observations, resulting in zero bias but maximal variance. An infinite
bandwidth parameter leads to a horizontal response curve: maximal bias and zero
variance. Hence, the choice of the bandwidth parameter involves a trade-off between
bias and variance. Most bandwidth selection techniques try to minimize some mean
400 CHAPTER 16
squared error criterion, i.e., the sum of squared bias and variance of the criterion
variable. 85
The advantage of nonparametric regression models relative to their parametric
counterparts is their flexibility: a nonparametric approach does not project the ob-
served data into a Procrustean bed86 of a fixed parameterization (Hardie, 1990).
Nonparametric modeling imposes few restrictions on the form of the joint distribution
of the data, so that (functional form) misspecification is unlikely. Also the consistency
of the estimator of the regression curve is established under much more general condi-
tions than for parametric modeling. Rust ( 1988) introduced nonparametric regression
models to marketing research. He emphasizes that nonlinearity, non-normal errors,
and heteroscedasticity are automatically accommodated as inherent features of the
method, without the use of special analyses requiring a high level of judgment and
knowledge. However, it is useful to remember that the primary substantive benefits
consist of the relaxation of functional form constraints and the allowance for flexible
interactions.
A disadvantage of the nonparametric approach is its convergence rate which
is usually slower than it is for parametric estimators (Powell, 1994). Thus, precise
estimation of the nonparametric multidimensional regression surface requires many
observations. 87 In Table 16.9 we show the sample sizes suggested for given num-
bers of predictor variables (dimensionality) in a nonparametric regression model (for
details see Silverman, 1986, p. 94)88 to illustrate "the curse of dimensionality". For
comparison purposes, we also show the sample sizes suggested for parametric re-
gression models, based on the rule of thumb: "5 observations per parameter". From
Table 16.9, we conclude that for a simple problem of a nonparametric brand sales
model with three marketing instruments for each of three brands as predictor variables
(i.e a total of nine own- and cross-brand variables), we would need almost 200,000
observations.
It is useful to provide some discussion of the numbers in Table 16.9. For the
parametric model, the rule of thumb does not differentiate between main- and in-
teraction effects. Thus, for each additional parameter we would need another five
observations, no matter what kind of parameter it is. We know, however, that for
most types of historical data, the degree of multicollinearity tends to increase as the
number of predictors increases. As the multicollinearity increases we need a larger
number of additional observations than the five proposed in the rule of thumb. And if
the additional variables represent interaction effects, the increase in multicollinearity
will be especially severe.
4 5
2 19 10
3 67 15
4 223 20
5 768 25
6 2790 30
7 10700 35
8 43700 40
9 187000 45
10 842000 50
By contrast, the nonparametric model has sample size "requirements" that explode
with the number of predictors. The reason is that in addition to accommodating
functional forms flexibly, the nonparametric approach also allows completely flexible
interaction effects. Thus, with two predictors we need to estimate a completely flex-
ible three-dimensional response surface (see Figure 16.10). Importantly, with each
additional predictor, the nature of possible interaction effects explodes and it is for
this reason that the sample sizes imply a "curse of dimensionality".
The sample sizes shown in Table 16.9 for the nonparametric model should only
be taken as suggestive. The critical determinants of the information required for
sufficiently precisely estimated curves are the nature of variation for each predic-
tor (ideally evenly distributed) and the nature of covariation between the predictors
(ideally zero). On the other hand, in parametric regression, when the functional form
is assumed to be known, the precision of parameter estimates depends on the amount
of variation in each predictor (the greater the better), while the desired condition for
covariation between the predictors is the same as for the nonparametric model.
Rust ( 1988) provides two nonparametric kernel regression applications. One is based
on a study of the behavior of Hispanic consumers, and focuses on how Hispanic
identification (x1) and income (x2) affect usage of Spanish media (y ). A conventional
regression analysis obtains an R 2 value of 0.13, while the nonparametric regression
yields an R 2 of 0.36. Importantly, the relationship between the degree of Hispanic
ethnic identification and the usage of Hispanic media is found to be nonlinear in a
manner that is virtually impossible to anticipate, and difficult to specify parametri-
cally. We show in Figure 16.7 that the use of Hispanic media increases when the
Hispanic identification level goes from low to medium, then decreases a bit, after
which it increases again with increasing identification levels. We note that the esti-
402 CHAPTER 16
1.4
(high) 1.3
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
(low)
0.0
0.0 0.6 1.2 1.8 2.4 3.0 3.6 4.2 4.8
(low) (high)
Hispanic identification (x)
mated nonmonotonic relation needs further verification, since the unusual shape may
also result from sparse data and from missing variables.
Rust's second application is an investigation of how profitability of a firm in-
fluences the compensation of its top marketing executives. In particular, cash salary
(y) is taken to be a function of net profit (x ). A parametric regression shows little
explanatory power (R 2 = .01), but a nonparametric regression obtains R2 = 0.31.
The relation between executive salary and company profitability is nonlinear, with
high salaries going to executives in companies with either large losses or large profits
(Rust, 1988, Figure 10). Such unusual nonlinearities are difficult to anticipate. And
with additional predictors, the problem of anticipating shapes becomes extremely
complex given the infinite variety of possible interaction effects.
parametric models nor are they as efficient as parametric models. We show in Table
16.10 how semiparametric models are positioned between the parametric- and non-
parametric ones.
Yt = m(x1
(I)
) + x1(2)' f3 + ur, t =I, ... , T. (16.173)
Here the vector of predictor variables x1 is split into two parts, xP> and x?>. The
effect ofxfl> is modeled nonparametrically, while the effect of x?> is modeled para-
metrically. Since xil) contains fewer predictor variables than x 1, the nonparametric
function m(.) operates on a vector of lower dimensionality than the fully nonpara-
metric model (16.167) does. In this way, the semilinear model reduces the curse of
dimensionality. Robinson (1988) provides an estimation procedure for this model.
Van Heerde, Leefl.ang and Wittink (1999a) use a semilinear regression model for
the estimation of the deal effect curve. The deal effect curve represents the relation
404 CHAPTER 16
between sales and price discounts. The marketing literature suggests several phenom-
ena, such as threshold- and saturation effects, which may contribute to the shape of the
deal effect curve. 89 These phenomena can produce severe nonlinearities in the curve,
which may be best captured in a flexible manner. Van Heerde, Leeflang and Wittink
model store-level sales as a nonparametric function of own- and cross-item price
discounts, and a parametric function of other predictors (all indicator variables). Thus,
flexible interaction effects between the price discount variables are accommodated as
well.
The criterion variable of their semiparametric model is log unit sales of a given
item or brand in a store-week. Taking log unit sales as the criterion variable, instead
of unit sales, is desired because the parametric function is multiplicative. The multi-
plicative function assumes that the stores have homogeneous proportional effects.
The discount variables are log price indices. The price index is the ratio of actual
price in a store-week to the regular price for an item in the store. Both actual and
regular prices are available in the ACNielsen data sets. ACNielsen uses an algorithm
to infer regular prices from actual prices and price promotion indicator variables.
Price indices less than one represent temporary price cuts (deals).
For item j, j = I, ... , n, the specification for the semiparametric model is:
L n
1nqkjt = m(ln(Plklt), 1n(Phzr), ... , In(Plknr)) +L LYtrjDtkrt +
t=l r=l
8jrXr + AkjZk + Ukjt. (16.174)
t = I, ... , T and k =I, ... , K
where
qkjt unit sales (e.g., number of pounds) for item j in store k,
in weekt,
m(.) a nonparametric function,
Phn = the price index (ratio of actual to regular price) of
item r in store k in week t,
Dtkrt = an indicator variable for feature advertising: I if item r
is featured (but not displayed) by store k, in week t;
0 otherwise,
Dzkrt = an indicator variable for display: I if item r is displayed
(but not featured) by store k, in week t; 0 otherwise,
D3krt = an indicator variable for the simultaneous use of feature
and display; I if item r is featured and displayed
by store k, in week t; 0 otherwise,
X 1 = an indicator variable (proxy for missing variables
89. See, for example, Gupta and Cooper ( 1992).
ESTIMATION AND TESTING 405
700
600
500
400
300
200
100
0+----.---.----.---.----.---,----,---.----,---,,-
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Price index brand I
Figure 16.8 Nonparametric own-brand deal effect curve for brand 1.
We show the own-item deal effect curve from the semiparametric model for bmnd
1 in Figure 16.8. The graph shows price index values on the x-axis and predicted
incremental sales volumes on the y-axis. For the x-axis 1000 focal item price indices
were generated, equally spaced between the lowest and highest price indices observed
in the estimation sample. For this purpose, the other items' price indices were fixed
at one. For each of these 1000 price indices we estimated the sales nonparametrically
(for an average week and an average store). It is clear from this graph that there exists
a threshold before the price index generates incremental volume. The minimal price
discount that appears to be required before a sales increase occurs is about 10 percent,
for this brand. There is, however, no evidence of a saturation effect.
The semiparametric model (16.174) also includes flexible cross-item price discount
effects. We show one example of a cross-item deal effect curve in Figure 16.9. The
curve shows the influence of price discounts for brand 3 on brand 1's sales. Inter-
estingly, this curve also shows a threshold effect at approximately 10 percent. Thus,
for discounts greater than 10 percent by brand 3, it appears that unit sales for brand
1 rapidly decrease. However this graph does show a saturation effect. Beyond a 30
percent discount, brand 1's sales appear to be immune to further discounts by brand
3.
The nonpammetric part of equation (16.174) accommodates flexible interaction
effects between price discounts of different brands, since m(.) is a function of each
brand's price index. To illustmte, we use the price indices for brands 1 and 2 to show
a three-dimensional deal effect surface in Figure 16.1 0. Here, the vertical axis repre-
ESTIMATION AND TESTING 407
140
130
120
110
100
90
80+---~==r===~~~------.---.---.---.---~
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Price index brand 3
Figure 16. 9 Nonparametric cross-brand deal effect curve for brand 3 's price index on brand 1 's
sales.
sents the predicted sales volume of brand 1. The other two axes represent the price
indices of brand 1 and brand 2, respectively. The top part of the three-dimensional
surface (the curve A-B) is brand 1'sown-item deal effect, also shown in Figure 16.8,
when the other two brands both have a price index of one. As one would expect, this
curve tends to decrease in magnitude with a decrease in brand 2 's price index (as
is apparent when one moves forward from the curve A-B, along the axis of brand
2 's price index). Importantly, the own-brand deal effect curve for brand 1 has a very
different shape when brand 2's price index is, for example, at 0.55 compared to the
index being at 1.0. Thus, the interaction effect is highly nonlinear which would be
very difficult to model parametrically. Substantively, if brand I is promoted with a
deep discount, brand 2 can reduce brand 1's sales gain considerably if it has a price
discount of (at least) 25 percent.
Van Heerde, Leeflang and Wittink (1999a) also compare their semiparametric model
to a parametric benchmark model. The parametric benchmark they use is ACNielsen's
SCAN*PRO model for promotional effects (9.19). The comparison is performed by
(a) contrasting the deal effect curve for the semiparametric- and parametric models,
(b) comparing fit statistics in the estimation sample, and (c) comparing predictive
results in the validation sample. As for (a), the authors conclude that " ... the parametric
model overstates the effects of the smallest and largest discounts and understates the
effects of intermediate discounts". As for (b), they find that the flexible modeling
of deal effects provides considerable gains in fit, and for (c) there are notable im-
provements in forecast accuracy. Decomposition of the prediction error shows that
both the increased flexibility in nonlinear effects and in interaction effects contribute
to improved performance for the semiparametric model. Separately, the proportional
408 CHAPTER 16
Figure 16.10 Nonparametric deal effect surface for brand I 's sales with own-brand price index
effect and the cross-brand effect of brand 2 's price index.
In this section we discuss a few applications of the statistical tests introduced in Sec-
tion 16.1. We also consider parameterization issues of models with behavioral detail.
We discuss estimation and testing of models with a substantial amount ofbehavioral
detail in Section 17 .1.
To illustrate diagnostic test applications, we review studies of the influence of
tobacco advertising on smoking. This question has been the subject of official re-
views in the United States (US DHHS, 1989), New Zealand (NZDH, 1989) and the
United Kingdom (Smee, Parsonage, Anderson and Duckworth, 1992). The UK study,
completed by the Economic and Operational Research Division of the Department
of Health in the UK, considers the evidence with respect to the effect of tobacco
advertising and the effect of advertising bans on tobacco consumption in selected
ESTIMATION AND TESTING 409
countries. One study included in Smee et al. (1992) is about the effectiveness of
the Norwegian Tobacco Act. This Act was implemented in July 1975. Smee et al.
obtained the following numerical specification:
Equation (16.175) was estimated by OLS from annual data over the period 1964-
1989. The original model specification included additional predictors such as the
relative price of tobacco and real income, but the estimated effects for those variables
were not statistically significant (the estimated effects in (16.175) are highly signif-
icant). Suitable data on advertising expenditures were not available. The indicator
variable, TA 1 , was used to estimate the influence of the Tobacco Act upon tobacco
sales.
Diagnostic tests used by Smee et al. include the RESET test which is an F -test
based on incremental R 2 (Section 16.1.5). With one additional predictor, Smee et
al., obtained F = 0.65 which is not significant. Thus, this specific diagnostic test
produced no evidence of misspecification. The power of this specific RESET test is
probably very small, however.
Smee et al. used the Lagrange multiplier (LM) test for the null hypothesis Pi =
P2 = P3 = 0, where Pp. p = 1, 2, 3, are autocorrelation processes defined in (16.46).
This test statistic has a xiJfdistribution. Its calculated value is 1.41 which is not
significant. Thus, there is no evidence of autocorrelated disturbances, specifically of
order 1, 2 or 3.
Smee et al. ( 1992) used the White test to examine the assumption ofhomoscedas-
ticity and to test the functional form. The x 2-values of the White-test also provide
no evidence of heteroscedasticity or misspecification. Finally, Smee et al. used the
Jarque-Bera test to examine the normality assumption. This test is also chi-square
distributed, and the null hypothesis of normality could also not be rejected. The power
of this test is also minimal, however.
Smee et al. (1992) decided that since (16.175) withstood all tests for misspec-
ification, they can conclude that the advertising ban in Norway reduced tobacco
consumption by about 2 percent in the short run and 1 _ ~. 862 · 2% ~ 14.5 percent
in the long run.
Although (16.175) could not be rejected based on the diagnostic checks, it can be
criticized on the following aspects. 90
90. See Leeflang, Reuyl ( 1995).
410 CHAPTER16
• Although suitable data on advertising were not available, the omission of adver-
tising as a predictor variable may bias the estimated effect of the ban.
• The parameters were estimated from annual data, which leads to a data interval
bias in the effects oflagged variables (see Section 6.1).
• The data cover a period of 25 years. It is implausible that the parameter of the
lagged endogenous variable (q 1_!) is constant over such a long period. Thus, the
use of the corresponding parameter estimate for the calculation of a long-term
effect of the advertising ban is questionable. This long-run effect is very sensitive
to the lagged variable parameter estimate.
• The effect of the indicator variable is estimated from a quasi-experimental de-
sign. One imagines that various forces in Norwegian society together produced
the legislative climate for the Tobacco Act. This environment must also have
produced other actions such as the introduction of health education programs,
health warnings on tobacco packaging and restrictions on the sale of tobacco to
young people. It is impossible to distinguish the effect of the Tobacco Act from
the effects of such other initiatives that occurred at the same time.
• The criterion variable is product category sales of manufactured and roll-your-
own cigarettes. Entity aggregation over the individual brands and over the differ-
ent product categories can severely bias the parameter estimates, especially if the
brands and categories compete against each other (see also Section 14.1.2).
We use this application to make several points. For example, it is important that we
use diagnostic tests of the model specification, against the data. However, a lack of
statistically significant results does not imply the model is acceptable. In this ap-
plication all tests lack power so that we should rely on theoretical and conceptual
arguments to critically evaluate the model specification. In this example there are
serious shortcomings in the model specification, the variable definitions and the (ag-
gregate) data.
We next consider the inclusion ofbehavioral detail at the aggregate level (see Chapter
12 for treatments at the individual level). We distinguish three kinds of flow models
with behavioral detail at the aggregate level:
• models of intermediate market response;
• diffusion models;
• adoption models.
To estimate models of intermediate market response, we can use consumer panel
data to calculate transition probabilities. Define Nij,t+l as the number of consumers
in a sample who bought brand i in t and brand j in t + 1 (i,j = 1, ... , n, t =
1, ... , T). Independent of whether the panel members all purchase the same number
of units, we can compute switching from one brand to another on the basis of pur-
chase sequences. The (maximum likelihood) estimates of, for example, the fraction
of switching consumers (Pij,t+I) can be obtained by the following expression:
Nij,t+l
Pij,t+l = ---, i, j = 1, ... , n, t = 1, ... , T (16.176)
Nit
ESTIMATION AND TESTING 411
where
Nit = the number of consumers who bought brand i
in period t.
Given n brands, and one equation per brand, there are n 2 unknown parameters. How-
ever since:
n
LPij=l, forj=l,2, ... ,n (16.178)
j=l
there are only n(n- 1) independent transition probabilities. There are nT observa-
tions, but since:
n
L mit = 1, for all t (16.179)
j=l
only (n - l)T are independent. In order to obtain reliable transition probability esti-
mates (n - 1) T should be substantially larger than n (n - 1), or T should sufficiently
exceed n. In addition, it is critical that the time period is closely linked to the purchase
interval. Also, if the transition probabilities are not linked to marketing variables, we
estimate probabilities from the aggregated data.
Telser (1963) shows how the Pirvalues can be estimated from the equations
(16.177) after a disturbance term u j,t+! has been added. 91 In Section 12.3.3 we
discuss a model in which the transition probabilities are related to decision variables.
These probabilities and other parameters are estimated from aggregate data. 92
The discussion of macro flow adoption models (Section 10.2) shows that these models
require a substantial amount of (individual) data. These data have to satisfy various
requirements, and as a consequence the data collection phase requires much effort.
The first piece of information needed is the definition and the size of the target
group for the new product, for example the number of potential product users. If
the new product is an acne remedy, the target group could be defined as the teenage
population. More narrowly, we could limit the target group to those teenagers who
91. For a more rigorous derivation of this stochastic relation see Leeflang (1974, pp. 123-124).
92. For an extensive discussion of these models see Lee, Judge and Zellner (1970).
412 CHAPTER 16
actually have an acne problem. What matters most is that all measures are consistent.
For example, if the proportion trying the product in a given period is defined with re-
spect to the total teenage population, then the target group should be defined likewise
(or an appropriate adjustment should be made).
The definition of the time units is usually the smallest interpurchase time interval,
e.g., a week. Many models are designed to estimate how many people will try the
product over time. That is, we want to predict the number of people who will purchase
the product the first week (assuming that one week is the time unit), the second week,
and so on. The predictions are extrapolations to, say, the national level based on test
market results. The selection of representative test markets is therefore crucial. But
the selection of an appropriate test market is not enough. The following kinds of
questions should be raised. Will product availability at the national level be the same
as in the test market? Will the level of advertising be comparable? If not, we must
adjust the trial rates accordingly. Such adjustments are not easy to make. For example,
how does product availability depend on the deal offered to retailers? Without relevant
data, an adjustment for changes is subjective, see equation ( 10 .17).
As another example, suppose the intensity of advertising spending is very high
in a test market, and higher than the company can comparably support in, say, a
national introduction. How lower advertising spending affects the trial rates is, how-
ever, difficult to know if one has not considered the relation between the trial rate and
advertising. It is therefore important to keep advertising spending in the test propor-
tional to a national scale, in which case the relation between trial and advertising is
not needed. However, in case of uncertainty about the proper level, it is useful to do
some experimentation.
In new-product adoption models, the trial rate is a crucial variable. Some possible
measures are:
1. Make regular shelf counts and inventory counts in the stores carrying the product
to see how much is sold per unit of time. This is a superior measure (either in
the test market or in the "real" market) to counting shipments from the factory,
or from warehouses to retail stores, since the latter measures can distort sales
because of "pipeline fillings" 93 and time lag effects. However, instead of regular
counts in all stores it may be acceptable to use a representative sample of stores
instead. The trial rate for the test market is then estimated by the trial rate in the
sample stores.
2. The trial rate may be estimated from a panel of consumers. The adequacy of this
measure depends on the size of the panel, its representativeness, and the product
category purchase frequency. The use of scanner equipment in many stores makes
it possible to obtain trial rates unobtrusively from household panels using plastic
cards or wands (see Chapter 15).
3. A third method is to select a random sample of target market members and ask
the selected individuals if they purchased the product and, if so, in what quantity.
93. Wholesalers and retailers have incentives to establish inventories of goods as part of the pipeline (the
channel) from producers to final consumer (see also Section 15.3).
ESTIMATION AND TESTING 413
The manner in which repeat purchase information is obtained depends on the macro
flow model. In the "SPRINTER"-type models a purchase-frequency distribution is
defined for all consumers who "tried" the product. This distribution allows one to
distinguish between, say, heavy and light users. Some triers are ready to make a
new purchase the following period, others within two periods, yet others within three
periods, and so on. However, it is difficult to obtain a valid/reliable distribution since
by definition we have little information. Conceivably, this information is available
for related products, for example, existing brands within the same product class.
Alternatively, one could use a survey or study the purchase histories of consumers
who participate in scanner-based household panels.
In the ASSESSOR model the repeat rate is estimated as the equilibrium share
of a two-state Markov process: see equation (10.18). Here only one repeat rate was
considered, although one might segment the market and determine a repeat rate for
each segment.
In conclusion, we note that aggregate flow models pose more difficulties with
regard to data collection than do aggregate response models without behavioral detail.
Possible problem areas are: test market selection, control of marketing instruments,
sampling of stores or consumers, and distribution of interpurchase times. Data col-
lection has to be planned and carried out carefully, to ensure that the quality of
information is high and the data characteristics match the intended use.
Estimation of the parameters of macro flow models closely follows the data collection
efforts. The process appears to be easier than for models having no behavioral detail.
For the latter models, we have to find an appropriate method, the selection of which
depends on various diagnostic tests. In most aggregate flow models, initial parameter
estimates, based on panels, store data or surveys, are in a sense "first stage" estimates.
A comparison of actual (test) market results with predictions might then lead to
adjustments in the parameter estimates. Such a process of gradually improving the
fit by adapting parameter estimates is similar to the "updating"-step in the model-
building process (Section 5.1 ). Often the adjustments require a lot of judgment. For
more on this, see Section 18.6.
16.9.1 JUSTIFICATION
In the previous sections of this chapter we dealt with methods developed to extract
parameter estimates from objective data, data that represent observed or observable
quantities. If the estimated relations allow managers to be more effective decision
makers than they would be otherwise, the question we address now is how we can
generate similar quantifications in the absence of objective data.
To justify the use of alternative bases for the quantification of relations, we refer
to the arguments we have made in favor of model building (see Section 3.2). For
414 CHAPTER 16
In the absence of models, decision makers (DM's) make judgments based on their
own experiences, the experience of colleagues or the habits and beliefs that are part of
an organizational culture. The judgments reflect implicit assumptions about response
parameters. Rarely, however, do the implicit parameter values remain constant across
conditions. It is especially for this reason that a "model of man" can outperform
"man". That is, a model of repeated judgments made by one person can better predict
the actual outcomes of those very judgments. There is a large body of research on
the success of "models of man". For example, a regression model of an admission
director's judgments of academic performance for MBA students (as a function of
their GMAT scores, undergraduate GPA's, undergraduate institution qualities, etc.)
predicts actual performances better than the very same judgments on which the model
is estimated. The reason for this result is simple: the model is consistent. It gives
exactly the same prediction today, tomorrow or any time given a set of values for
the predictors. The admission director, however, makes judgments that are subject to
noise (or to conditions that do not relate to the academic performance of the students).
If the admission director's task were to admit the applicants who are expected to have
the strongest academic performance, the model of the director's judgments will tend
to generate predicted values with greater accuracy than the judgments.
Of course, we may argue that predictions from a model estimated with objective data
can do even better. This should be true as long as the MBA program content and
other aspects stay relatively constant over time. In that case data from students who
have graduated can be used to obtain the parameter estimates that best explain their
actual academic performance. The estimated model can then be used to predict the
performance of future applicants. This model would of course also give exactly the
same prediction any time it is used for a given set of values for the predictors. And
this "model of outcomes" will outperform the "model of man" as long as the bases
for actual performance remain the same.
We now want to restrict ourselves to cases in which there are no objective data or
ESTIMATION AND TESTING 415
those data are insufficient. In the MBA application, past data may be insufficient
if the students' characteristics change dramatically or the curriculum and/or the re-
quirements for good performance are very different than in earlier times. In general,
insufficiency of the data may also be due to a lack of variability in one or more
predictors, excessive covariation between predictors or other severe problems. Also,
the competitive environment for marketing decisions can change dramatically, for
example, after a new brand enters the market. Thus, historical data sometimes do
not provide insight into the relations that pertain to the new environment. 94 Most
econometric models are "static" in the sense that both the structure and the param-
eters are fixed (we discuss an exception to this under varying parameter models in
Section 17.4). Thus, subjective estimation may not just be attractive in the absence
of objective data but also to overcome the limitations of historical data. Of course,
if decision makers can identify changes in the market environment that invalidate
an existing model, they should also be able to recognize the adaptation required in it.
Possible adaptations include a more flexible model structure through, for example, the
use of qualitative variables (such as indicator variables being "on" or "off'', dependent
upon the decision maker's assessment of market conditions) and varying parameters.
One benefit that we propose relevant to the use of models from subjective data is that
it formalizes the process of predictions, and allows the decision maker to diagnose
their accuracy. In addition, it forces decision makers to be explicit about how they
believe variables under their control affect certain performance measures. And, when
multiple experts provide judgments, the subjective estimation separately for each ex-
pert shows the nature and the extent of differences. If such differences get resolved
before a decision is made, then the prevailing perspective gets disseminated. When
experts cannot agree, future outcomes can serve as the basis for a determination of
relative accuracy of alternative approaches.
The experts who supply subjective data are called assessors. The ultimate decision
makers, internal and external consultants, and sales representatives are all potential
assessors. Members of the sales force can be especially helpful when clues about
future sales levels are gathered. The sales force members have contact with the cus-
tomers, and this should allow them to provide relevant expertise. In addition if sales
forecasts are used for the determination of sales quotas it is helpful to have sales force
members involved in the process. For example, their participation will increase their
confidence in the quotas being fair, and this will increase their motivation to achieve
the quotas. Of course, there is also the possibility that they will try to "game" the
system. 95
94. Marshall, Oliver (1995, p. 1), for example, suggest that decisions often depend on various subjective
judgments about the future.
95. Compare Lilien, Rangaswamy (1998, pp. 130-131).
416 CHAPTER 16
distributors, suppliers, consultants, forecast experts, etc. Some or all of these stake-
holders may be asked to constitute a jury of executive opinion. When representatives
of various groups of stakeholders get together, the purpose of the meeting is for the
group to come as close as possible to a single judgment. A variation on this is the
Delphi method in which experts write down their judgments in a first round. Each
expert receives summary information about the independent judgments made, and
this information can influence the expert's judgments in subsequent rounds.
a. point estimation;
b. response functions;
c. probability assessments.
The subjective data consist of opinions Gudgments) and intentions (Armstrong, 1985,
Chapter 6). Intentions are indications individuals provide about their planned behav-
ior or about the decisions they plan to make or the outcomes of those decisions. Opin-
ions refer to forecasts about events whose outcomes are outside the assessors' control.
Intention surveys are frequently used to forecast the demand for a new product (Mor-
rison, 1979, Kalwani, Silk, 1982, Jamieson, Bass, 1989, Morwitz, Schmittlein, 1992).
Other applications involve estimation of the impact of a possible entrant on a market
(Alsem, Lee:flang, 1994); see Section 16.9.5.
POINT ESTIMATION
Cumulative
distribution
2
1.0
0.7
____________ !:- ::,....--
/
-
/
/
I
I
0 1000 Sales
Figure 16.11 Point estimate (A) and two cumulative distribution jUnctions.
We can provide some training, or we can change the descriptions, for example by
using "chances" or "odds" which may be more familiar terms. If we only ask for a
point estimate, we are often looking for a measure of central tendency. The answer
may provide an estimate of the mode, the median, or mean, as illustrated below:
1. "What is your estimate of the most likely level of sales?" gives an estimate of the
mode.
2. "What level of sales do you estimate you have an even chance of reaching?"
provides an estimate of the median.
3. "What level of sales do you expect?" results in an estimate of the mean.
RESPONSE FUNCTIONS
A logical extension is to obtain a set of point estimates, one for each of a number of
values of the marketing instruments, which generates a point estimate of a response
jUnction. At the same time, the construction of a subjective response curve enables us
to estimate quantities that cannot normally be assessed directly, such as elasticities or
response parameters. We consider an example98 from the ADBUDG model:
98. This example is based on Little (1970); see also Lilien, Kotler (1983, pp. 131·134). Little's model is more
complex than ( 16.180). For example, he considers long-run and short-run effects. In (I 6.180) we omit the brand
index for convenience. ADBUDG's model structure is discussed in Section 5.3.3.
418 CHAPTERI6
a5
m, = a+ (,8 - a ) - 1-- (16.180)
y +af
where
m, market share of a brand in period t,
at = advertising expenditures of the brand in period t, and
a,,B,y,fJ = the parameters.
m,
0. 7 ------- --------------------------------
0.6
0.5
0.4
0.3
0.2
0.1
0.0 +-...,--.,--.....-,---,--.----,r----r-----
0 2 4 6 8 10 12 14 16 a1 ($00, 000)
The estimated values are approximately: y = 30 and 8 = 0.25. Figure 16.12 shows
the brand manager's implicit market share function.
One may object that the four parameters in (16.181)-(16.182) are estimated from
four observations. Hence the model fits the data perfectly. But there is, no guarantee
that additional subjective judgments fit the market share function shown in Figure
16.12. Thus, we prefer to collect additional observations. For example, we may elicit
market share estimates for advertising expenditures equal to the current budget plus
20 percent, plus 40 percent, ... minus 20 percent, minus 40 percent, and so on, thus
providing a scatter of points through which a market share function can be fitted using
(non-linear) estimation methods. In that case, deviations of the subjective estimates
from the fitted curve allow us to check the consistency of a manager's estimates. In
case of systematic deviations, we can consider an alternative functional form, in close
cooperation with the manager.
The disadvantage of asking additional judgments is that it requires more time and
effort from the manager. These judgments only capture the manager's expectations
about market share, given values for one marketing instrument. However, extensions
to multiple predictors are straightforward. Importantly, the judgments about market
share given advertising are conditional on, for example, the brand's price and the
marketing instruments for other brands.
To illustrate the approach for multiple predictors, we consider a market with two
brands (j and c) which compete on price (Pjt. Pet) and advertising (aj 1 , act). For
simplicity, suppose there are two alternative price levels p), PJ
for brand j and
two for brand c: p~, p~. Similarly, suppose a), a], a~, a~ are alternative levels of
advertising expenditures. We now ask an assessor for a point estimate of brand j 's
420 CHAPTER 16
P'J p~ al ate mt
J J
pl p~ aI a2c m~
J J J
p\ p~ a2 atc m3
J J J
p2 p~ a2 at mi6
J J J
market share if Pit= p 1, Pet= p~, ait= a), act= a~. Let this estimate be m). The
assessor can then provide market share estimates for all sixteen combinations of p Jt,
Pet, ait• act (see Table 16.11). These sixteen subjective observations could then be
used, for example, to estimate a MCI-model:
(16.183)
where
u jt = a disturbance term.
Of course, if objective data were available (and of sufficient quality), we could esti-
mate (16.183) also with such data. One advantage of using both objective and sub-
jective data is that we can compare the estimated parameters. Systematic deviations
could be attributable to misspecifications of the model as well as to the possible
difficulty the assessor has contemplating how the brand's market share depends on
the marketing variables. Thus, it is important to confront predictions from any model
with actual results, and to make modifications where appropriate.
Most of the models developed for parameterization with subjective data date
from the seventies. Examples are ADBUDG (Little, 1970), CALLPLAN (Lodish,
1971) and BRANDAID (Little, 1975a, 1975b). For more detail about these "deci-
sion models" based on subjective data see Rangaswamy (1993). 99 The judgmental
assessment of market response to marketing variables is not without controversy. In
laboratory research, Chakravarti, Mitchell and Staelin ( 1981) found that subjects who
used the ADBUDG model made worse decisions than those who did not use it. Little
and Lodish ( 1981) argue, however, that subjects in a laboratory experiment rarely
have access to critical types of information, such as knowledge of the dynamics of
a particular market. For example, in the laboratory studies in which judgments are
compared with statistical models of those judgments, the information available to
the DM is the same as that available to the model. Consequently, in the laboratory
Density
function
the DM's cannot take advantage of any skills they may have with respect to other
information not incorporated in the model (Blattberg, Hoch, 1990). 100
PROBABILITY ASSESSMENT
100. See also Mcintyre (1982). Mcintyre, Currim ( 1982). Beach and Barnes (1987) review the vast literature
on biases in judgment and decision making. For additional perspectives see Hogarth (1987), Philips ( 1987),
Bunn and Wright ( 1991) and Gupta ( 1994).
422 CHAPTER 16
defined, and relevant knowledge is evoked from the expert. Response assessment
encompasses the steps in which numerical or verbal qualities are attached to the
propositions. Belief assessment is dominated by reasoning, whereas response assess-
ment is judgmental. The use of these subprocesses is relevant for the description of
complex decision processes. Examples of models which can be parameterized in this
way are the process models (Section 4.1) and the models that describe the decision
process of an industrial buyer (Section 8.1 ). These models require skills of reasoning,
making inference and deriving conclusions from data through the application of ar-
guments.
To help managers with the construction of beliefs, decision analysts have developed
structuring methods such as graphical representations and direct questions.
Graphical representations can be used to prompt relevant information from an
individual decision maker and to structure that information in a visual form. Exam-
ples include hierarchical and non-hierarchical decision trees and knowledge maps. A
knowledge map is an influence diagram without decision nodes i.e. containing only
chance nodes, or
"a graphical representation which may cause the decision maker to consider
information in the formal mental imagery"
(Browne, Curley, Benson, 1997, p. 11)
Knowledge maps can be used to obtain probabilities for individual factors, and to
derive the best estimate of the probability for a critical event. We show an example
of a knowledge map in Figure 16.14. This knowledge map relates to the probability
of success for a new automatic product, called Electroglide. The product is designed
to reduce fuel consumption through the substitution of electric power when a car
cruises. Individual subjects were asked to predict the success of the new product
by first evoking all the primary influences deemed relevant, then evoking secondary
influences, etc. This process continued until each subject was satisfied with the re-
sulting knowledge map (representing the subjects' beliefs). At the end, a probability
qualifying their conclusion is elicited.
Directed questions are often used by decision analysts to elicit information based
on checklists or questioning schemes. Although these methods can be useful for
structuring knowledge, they have an important drawback. Subjects will consider as-
pects merely because of suggestions made by the decision analyst. This information,
however, is not screened for relevance that is done naturally in decision-making tasks.
This drawback is similar to potentially artificial elements in relation to the judgmental
assessment of response.
The measurement of subjective probabilities relates to the philosophical view of
probability which does not require that the probability of an event has a uniquely
determinable value. In this subjective approach, probability expresses the beliefs of
an individual based on personal knowledge and information. In that sense the sub-
jective approach makes no attempt to specify what assessments are correct. However,
not all assessments are admissible. For example, subjective probabilities should be
ESTIMATION AND TESTING 423
coherent in the sense that there are no contradictions among them. Under this con-
straint it has been shown that subjective probabilities have properties of conventional
probabilities. 101
Although experiments reported in the literature show that individuals can pro-
vide appropriate subjective probabilities, it has been found that our ability to assess
probabilities depends on:
It is also important that subjective probabilities are in accordance with the assessor's
beliefs. Thus, the decision analyst may want to use methods that motivate to make
this correspondence strong. This leads to a third point:
c. the evaluation.
Assessor 1 Assessor2
Estimates Probabilities Estimates Probabilities
And, assuming that the three values are the only possible outcomes, each assessor
also provides an indication of the probabilities of occurrence. We show the represen-
tatives' answers in Table 16.12. Assessor I is more optimistic about the new brand's
sales than assessor 2, except for the highest estimate which is the same for the two
assessors. Assessor 1 also has a much higher subjective probability for the most likely
value of sales than assessor 2 does. As a result, the expected value for sales is higher
while the variance is lower for assessor 1.
In the absence of knowledge about the shape of a subjective probability distri-
bution, we can assess the uncertainty around the estimates by a variety of meth-
ods.102 One method focuses on the fractions of the Cumulative Distribution Function
(CDF). 103 Typically, five fractiles, O.ol, 0.25, 0.50, 0.75, and 0.99 are assessed. The
first assessment is the 0.50 fractile which is the probability that sales in period t
will be less than 0.50. Qo.so is obtained from the question: "Considering all possible
levels of sales, what is the amount for which it is equally likely that sales will be
more and that sales will be less than this value?" For assessor 2 in Table 16.12 the
response would have been 7 Million, hence Qo.so = 7 Million. We can obtain the
0.25 and 0.75 fractiles by asking: "If sales are, in fact, less than 7 Million, what
amount of sales would divide the interval from 0 to 7 Million units into equally likely
parts?" The resulting value is the 0.25 fractile, denoted as Qo.25· The 0.75 fractile can
be obtained from the question: "How would you divide the interval of sales over 7
Million units into equally likely parts?" Finally, the values for Qo.OI and Qo.99 can
be obtained by asking: "What value of sales would you use such that the changes of
sales being greater (less) than this value is only one in 100?'' We can then construct
a CDF-curve of sales by plotting the fractiles and corresponding sales values. We
provide an example in Figure 16.15.
102. See Winkler (1967a), Smith (1967) and Hampton, Moore and Thomas (1977).
103. A similar example appears in Lilien and Kotler (1983, p. 135).
ESTIMATION AND TESTING 425
Probability
1.00
0.75
0.50
0.25
0
5.75 6.5 7 7.75 9.5 Sales (in millions of units)
Figure 16.15 A cumulative subjective probability function for sales (assessor 2).
1. "What is your estimate of sales such that there is a 2.5 percent chance that it will
be higher?" (Sf).
2. "What is your estimate of sales such that there is a 2.5 percent chance that it will
be below that level?" (Sf).
Since 95 percent of the possible values of a normally distributed random variable lie
within± 1.96 standard deviations of the mean, the standard deviation is estimated by:
A s,8 - sf
a = -'-----''- (16.185)
3.92
For asymmetric distributions such as the beta distribution, low (Sf} high (S18 ), and
modal (Sf"!) estimates are necessary. The estimates of the mean and standard devia-
426 CHAPTER 16
Market
share
--- --
----(0)
--- ---
--- (M)
---
--- ---
Advertising expenditures
Figure 16.16 Optimistic (0), modal (M) and pessimistic (P) market share-advertising response
jUnctions (constant uncertainty).
(16.186)
s,H- sf
a-= 6
(16.187)
An assessor may find it useful to look at the shapes of alternative classical probability
distributions, as suggested, for example, by Grayson ( 1967). Distributions are shown,
for example, in the gallery of shapes investigated by Raiffa and Schlaifer ( 1961 ). If
the assessor finds a shape in the gallery that comes close to his thoughts, he can use
the corresponding probability distribution.
So far we have considered distributions of the values a random variable may take.
This idea can also be used for distributions of individual points on a response function.
For example, we could construct an optimistic, a pessimistic and a modal response
function. Figure 16.16 shows an example of a market share-advertising response
function for which the degree of uncertainty stays approximately constant over the
whole range of values of advertising. This is not the case in Figure 16.17, where the
uncertainty is modest for advertising expenditures within the interval R A, but large
outside that interval. This could result if the firm's advertising expenditure levels
have normally fallen within RA. Managers will be less confident about points on the
response curve that are outside their experience.
ESTIMATION AND TESTING 427
Market
/(0)
share
/
/
/
/
--
/
______ ,.,-
_./
-"""
/I
I
~---------(P)
(M)
-- I
I
I
~ ... ...
/I
/ I
,. / "' RA
... ,.
--
Advertising expenditures
Figure 16.17 Optimistic (0), modal (M) and pessimistic (P) market share-advertising response
jUnctions (varying uncertainty).
b. The assessor
Winkler (1967a) showed that persons with previous training in quantitative methods,
in particular in probability and statistics, are better in translating their judgments into
probabilities than persons without such training. The elicitation of quality judgments
also depends on managers' attitudes toward uncertainty and toward the process of
measurement. Some people find it difficult to tolerate uncertainty. Others resist the
quantification of their uncertainty, because they are skeptical about measurement of
subjective states of mind.
c. Evaluation
An important question is whether the assessments correspond to the respondent's
beliefs. Once the outcome is known it is possible to determine how close the expected
value was to the actual value. Since we cannot evaluate the validity of subjective
probabilities, we can only help the expert to make sure the probabilities are in accor-
dance with her judgment. 104 The application of rules developed to make responses
and beliefs correspond will, in fact, also contribute to the learning process. That
is, these rules have the dual purpose of evaluating assessors and enabling them to
become better experts, as shown by Winkler ( 1967c) in an experiment concerning the
assessments of probabilities for outcomes of football games.
The two primary evaluation techniques are actual bets and scoring rules. In-
volving people in actual bets, with monetary rewards or penalties resulting from
the quality of their responses, is one way to motivate people to report their per-
sonal assessments as accurately as possible. Monetary rewards, however, may not
be sufficient to motivate the assessor to take the procedure seriously.
Scoring rules are used to determine a respondent's accumcy. The scores may be
used to determine the weights in combined subjective estimates. For examples of
scoring rules we refer to De Finetti ( 1965).
Other feedback methods are available if there are multiple assessors. We discuss
the combination of judgments next.
If there are multiple experts who can provide subjective estimates, the challenge is
how to combine the diversity of assessments. For example, lack of relevant informa-
tion drives a DM to obtain expert opinions (Winkler, 1986) and it is likely that experts
differ in the nature of their beliefs due to differences in their expertise. An avemge
opinion can be obtained by a mathematical approach (see below). Alternatively, one
may use a behavioral approach that aims to arrive at a consensus opinion.
The combination of subjective estimates generally leads to better results than
one can obtain from a simple assessor, as demonstrated by Lawrence, Edmundson,
O'Connor (1986), Mahmoud (1987) and Maines (1996). The accumcy of estimates
improves because the random errors in individual estimates are "diversified away"
(Ashton, Ashton, 1985). In this sense group estimates have the same advantage as the
"model of man" which also removes the noise in subjective judgments.
Winkler (1968) presents mathematical approaches, which are (weighted) average
methods. Let fi(B) be the subjective probability density function (continuous case)
or the probability mass function (discrete case) of assessor i. The combined or pooled
function f(O) can then be written as:
I
f(8) = L wifi(8) (16.188)
i=l
I. Equal weights
If the decision maker has no reason to believe that one assessor is better than another,
each assessor should have equal weight.
Wi = 1/l,foralli. (16.189)
2. Weights proportional to ranking
If the decision maker believes that the assessors can be mnked with respect to exper-
ESTIMATION AND TESTING 429
tise she may assign weights on the basis of these ranks. Each expert is given a rank
from I (the worst) to I (the best). Given a rank ra;, a simple way to assign the weight
w; is:
ra;
W; = --,-/__..;.__ (I6.190)
Li=lra;
or
ra;
w·----- (16.191)
1 - 1(1 + I)/2'
The drawback of this scheme is that the ratio of weights for the best and worst expert
expands as the number of assessors increases. For example, with two assessors, the
ratio of weights is 2 to 1. However, with six assessors, the ratio of weights for the best
and worst assessors is 6 to I. The dependency of this ratio on the number of assessors
appears to be a severe drawback of this weighting scheme.
c;
w; = I . (16.I92)
Li=!Ci
Although now the ratio of weights for, say, the best and worst assessors does not
depend on the number of assessors, the weighting scheme does assume that the ratings
have ratio-scaled properties. For example, an assessor rating his expertise equal to 10
would have twice the weight as an assessor with a rating of 5. And the assumption is
also that the assessors have comparable interpretations of the scale values, a property
that is rarely if ever applicable to rating scales.
Winkler (1967a) has found by experimentation that applying scoring rules to such
pooled estimates which he calls consensus scores, produces better results than simply
averaging individual subject scores. An argument in favor of a scoring rule is that the
score is based on performance accuracy. Nevertheless, one can also check whether
the decision itself is sensitive to which weighting scheme is applied.
A disadvantage of these methods is that the resulting f(B) may be multimodal
even if the constituting /;(B) are all unimodal. Winkler (1968) therefore proposed a
430 CHAPTER 16
Advertising Expenditures
more complex pooling procedure based on the concept of (natural) conjugate distri-
bution, originally developed to simplify Bayesian analysis. 105 This method has the
advantage of providing a unimodal f(8) if the underlying fi((}) are unimodal. A
disadvantage, however, is that the method requires that each /;(8) be a member of
the conjugate family, for example a beta distribution, which is reasonable only if
each assessor feels that the particular distribution accurately reflects his judgments.
We now consider the pooling of response curves, with a hypothetical example of a set
of market share-advertising response curves. Each curve is constructed by connecting
a set of point estimates. The point estimates correspond to four levels of advertising
expenditures: $0; $250,000; $500,000, and $1,000,000. Assume that four assessors
have given the point estimates shown in Table 16.13 from which the response curves
in Figure 16.18 are derived. The methods discussed above can be applied to the point
estimates corresponding to each of the four levels of advertising. 106 For example with
equal weights, pooled market share estimates of0.1625, 0.2525, 0.3275, and 0.4875
are obtained. This pooled response curve is shown as P in Figure 16.18.
An obvious drawback of these individual response curves is that each one is not
very different from being linear. As a result, the pooled curve also appears to be close
to a linear one. Also, with zero advertising it is unlikely that market share can be
much above zero in the long run.
Instead of averaging the individual responses to produce a pooled curve, one can
consider the sixteen estimates in Table 16.13 as observations, and apply econometric
analysis to arrive at a pooled response curve. This requires the specification of a
functional relationship (Compare equation (16.183)).
The methods proposed above to arrive at a pooled response curve are purely
mechanical. We can also use the data in Table 16.13 and Figure 16.18 and find out
why assessor 1 believes that market share can be doubled by doubling advertising
expenditures from $500,000 to $1,000,000, and why assessor 2 believes that this
105. A detailed treatment of(nalln'al) conjugate distribution is given by Raiffa and Schlaifer (1961). See also
Iversen (1984, pp. 64-66), Greene (1997, pp. 317-319), Lee (1997).
I 06. Combining nalln'al conjugate distributions is not possible since the assessors have only been asked to
supply point estimates and not entire distributions.
ESTIMATION AND TESTING 431
Market
share
0.7 /
- - - Assessor I
- - - - - - - Assessor2
/
0.6
- - - - Assessor 3 /
- - - - Assessor 4
0.5 ----Pooled /
/
0.4
/ .------
-----
0.3
0.2
0.1 /
0.0
0 2.5 5.0 10.0
Advertising expenditures in $100,000
increase would only add three share points (from 0.32 to 0.35). Trying to find out why
individuals have widely varying subjective estimates requires a behavioral approach.
The evaluation techniques, actual bets and scoring rules (Section 16.9.2) are
known as outcome feedback evaluation methods. Another type of feedback is cogni-
tive feedback. 107 Cognitive feedback provides assessors with information that allows
them to compare relations between the individual judgments and the combined judg-
ment. Some research indicates that outcome feedback is often ineffective (Balzer et
al., 1992) while cognitive feedback has been found to be beneficial (Balzer et al.,
1992, Gupta, 1994). We discuss below some approaches based on cognitive feed-
backs, "group reassessment" and "feedback and reassessment".
Group reassessment
After experts have made individual assessments, they meet as a group to discuss each
other's assessment and to arrive at a consensus point estimate (if each individual
expert had made a point estimate), or a consensus distribution (if each expert had
provided a subjectively estimated probability distribution). Such group discussions
may not produce a true consensus, however. Once individuals have established their
107. See, for example, Balzer, Sulsky, Hammer and Sumner (1992), and the discussion in Gupta (1994).
432 CHAPTER16
own subjective judgments for others to see, they tend to defend their positions. At
the same time, some individuals have a predisposition to be swayed by persuasively
stated opinions of others. Also, dominating individuals and people with authority can
have unusually strong influences. In addition, there is group pressure on an individ-
ual whose views are widely divergent from those held by many others to conform
(Wmkler, 1987). The impact of such psychological factors may be reduced by asking
experts to reassess their own individual estimates, after a group result is obtained.
The potential pressures that may allow influence to dominate expertise in group dis-
cussions may be avoided with the use of recently developed group reassessment tech-
niques: Nominal Group Techniques (NGT's) and Estimation-Talk-Estimation (ETE)
meetings. A NGT-meeting starts by having individual experts write ideas and formu-
late their estimates independently. During the meeting, each individual in tum briefly
presents one idea. All ideas are recorded on a flip chart, but no discussion takes place
until all ideas are recorded. Following a discussion of all ideas, each expert records
his or her evaluation of the ideas, by rank ordering or rating them. The scores are then
mathematically aggregated to yield a group "consensus". This technique has been
shown to be superior to the Delphi technique discussed below (Lock, 1987).
ETE-meetings are very similar to NGT. ETE requires "group estimation" before
and after group discussions of the ideas. Rank ordering or rating does not necessarily
take place (Parente, Anderson-Parente, 1987).
108. See, for example, Dalkey and Helmer (1962), Helmer (1966), Jain (1993, pp. 331-337).
109. Brown, Helmer(l964).
ESTIMATION AND TESTING 433
considerations suggested by one or more experts. Based on the feedback, experts are
asked to update their assessments, if necessary.
The results of the second round will, in general, have as many point estimates or
distributions as assessors, but the spread is expected to be smaller. Feedback from the
second round of estimates is also given to the assessors, who are once more asked
to revise their assessments. It is expected that the individual estimates will show a
tendency to converge as the collection of updated estimates continues. The process
may continue until a sufficient degree of agreement is achieved, or until individual
estimates remain unchanged between iterations. The rate of convergence tends to
depend on the degree of divergence in the original estimates, on the reasons for
differences, and on the evaluation of each expert's expertise relative to that of other
experts.
Several studies suggest that the Delphi technique produces good results. An inter-
esting application is by Larreche and Montgomery (1977) who asked for evaluations
of marketing models with respect to their likelihood of acceptance by management.
Lilien and Rangaswamy ( 1998, p. 131) discuss a Delphi study with 300 auto indus-
try executives to forecast the sales of electric vehicles 10 years into the future. 110
The consensus favors hybrid (electric-combustion) engines, and the estimates are for
roughly 150,000 of these vehicles to be on the road in the US in 2003. 111
In practice, objective data may be available for some variables in a model, but not
for all. Sometimes, subjective data can be generated to substitute for missing vari-
ables. However, we must then deal with the consequences for parameter estimation
(for example, errors in the measurement of predictor variables cause ordinary least
squares estimates to be biased and inconsistent). At other times, objective data may
be available but not useful for analysis. For example, as we discussed in Section
8.3.1 historical price data may show insufficient variation to permit reliable estimation
of price elasticity by econometric techniques. The econometric analysis can then be
complemented by subjective methods.
Blattberg and Hoch (1990) suggest that model-based systems for decision making
should be a hybrid system of 50 percent judgment (manager) and 50 percent model
(customer). They find that this mix does better than either model or manager alone. A
hybrid system may correct for the human's shortcomings in information processing
while the human picks up on cues, patterns, and information not incorporated in the
model (Bucklin, Lehmann, Little, 1998. p. 6).
110. This research was completed by the University of Michigan's Office for the Study of Automative
Transportation.
111. For an extensive review of the application of Delphi see Gupta and Clarke (1996) A comparison of
judgments obtained by a group consensus through a Delphi process and averaging individual judgments is
made by Lam\che and Moinpour ( 1983).
434 CHAPTER 16
Combining objective and subjective data and estimates can be accomplished by either
formal or informal analysis, as we discuss below.
Formal analysis
Combining subjective and objective information in a formal way is achieved by Bayesian
analysis. We show a general framework for Bayesian analysis in Figure 16.19. Sup-
pose a firm wants to estimate the trial rate (8) for a new product. Based on experience
from launching other products in the same category, the trial rate is subjectively as-
sessed. This subjective information is called prior information and may be available
in the form of a prior distribution f(B). The firm may also offer the product in a test
market, thus obtaining sample information z. Let this sample evidence or objective
information be available under the form of a sampling distribution l(z I 8). A decision
to launch or not to launch would then be based on the posterior distribution f(B I z)
obtained from Bayes' Theorem:
8 _ f(B)l(z I B)
(16.193)
f( I z) - f f(B)l(z I B)dB ·
Combining objective and prior information also applies to the estimation of response
curves. Assume the following model:
y = XfJ +u (16.194)
(16.195)
Suppose now that prior information is available on L' of the L parameters, i.e., a L' x 1
vector of prior estimates fip· Prior information can be subjective, but also objective,
such as estimates obtained from a meta-analysis (Section 3.3). So, for example, the
ESTIMATION AND TESTING 435
prior value of a price elasticity can be set equal to -1.7 6, i.e. the average value found
in a meta-analysis by Tellis ( 1988b). Assuming these prior estimates to be unbiased,
we have:
(16.196)
where f.Li is the error term for the lth estimate. Let the covariance matrix of the error
term of the prior estimates be:
A (I; 0) (16.200)
where I is an L' x L' identity matrix, and 0 is an L' x (L - L') matrix of zeros. 112
Combining (16.198) and (16.199) we obtain:
(16.201)
or
Y = ZfJ +v
with
E(vv') = [ a2/
0 <f>
0] •
(16.202)
Informal analysis
Informal analyses can be used to adjust or update empirically determined coefficients.
For example, parameter estimates from sample data can be adjusted by multiplying
them by subjectively estimated indices whose reference value is one 114 (i.e. if the
reference value is used no updating takes place). Alternatively, we can start with
112. We assume that prior infonnation is available on the first L' estimators. The variables can be arranged
such that this is the case.
113. For detailed discussions on Bayesian inference in econometrics see Schlaifer ( 1969), Zellner ( 1971 ),
Iversen (1984), Lee (1997).
114. Foran early application seeLambin(l972b).
436 CHAPTER 16
judgmental parameters. Little (1975b) does this based on the idea that people tend to
overinterpret patterns in historical data. Whatever form is used, we emphasize that
combining subjective and objective elements generally leads to better parameteriza-
tion, as found in many studies. 115
We note that the empirical approach based on sample information is at the core
of the analysis. Judgmental information is often not incorporated, and if it is, an ad
hoc adjustment is more common than a formal one.
16.9.5 ILLUSTRATION
At the time the survey was held (August-September 1989; before the introduction
of private broadcasting), two important aspects relevant to the future development of
the broadcast media were uncertain:
Table 16.14 Estimated advertising expenditures for 1989 and 1990 based on a sample of
advertisers and advertising agenciesa,b.
Scenario/situation 2 3 4 5
Penetration private 40% 80% 80%
broadcasting
Conditions public Unchanged Unchanged Unchanged Same as
broadcasting private
broadcasting
Public broadcasting 450 476 474 347 400
Private broadcasting 57 244 391 343
Using the intentions of the advertisers and their agencies, predictions for the entire
market are calculated as follows:
where
AEJ.t+ I = predicted advertising expenditures in medium j
for the entire market in period t + 1 ( 1989),
A E jt = advertising expenditures of the entire market in
medium j in t (1988),
ESTIMATION AND TESTING 439
_, j in t (1988),
A E J,t+ 1 intended advertising expenditures of the sample in
medium j in t + 1 (1989).
Since intentions were also obtained for 1990, AE J,t+2 can also be calculated:
- AEJt - 1
AE J,t+2 = AEj, . AE J,t+2 (16.204)
In this chapter we first discuss the specification and estimation of models with ob-
served and unobserved variables based on LISREL. LISREL models are vehicles
for bringing latent constructs and observational variables together. They encompass
a wide variety of other models, such as simultaneous equation models (see also
Section 16.4), seemingly unrelated regression models (see Section 16.3) and errors-
in-variables models.
In Section 17.2 we discuss mixture regression models. Mixture regression models
are models that assume the existence of heterogeneous groups of individuals in a
population.
Next we focus on time-series models. Recall that in Section 4.4 we distinguish
time-series (models) and causal models. Time-series models are uniquely suited to
capture the time dependence of the criterion variable. In Section 17.3 we discuss the
specification and estimation of ARMA and ARIMA models. We discuss the inclu-
sion of seasonal effects and the inclusion of independent and explicitly formulated
predictor variables through transfer functions.
There are reasons why it is important for models to accommodate changes in
model structure and/or parameters. In Section 17.4 we discuss the specification and
estimation of varying parameter models. In these models, the model parameters are
allowed to vary.
The foundations of structural equation models with latent variables come from Jores-
kog (1973, 1978). He developed LISREL, a computer program which has become
almost synonymous with the models 1• Structural equation models are models for
the analysis of relationships among observed and unobserved variables. We use an
I. Many excellent texts on structural equation models have appeared. see for example Long (1983), Bollen
( 1989). For a comprehensive introduction to LISREL for marketing research, see Bagozzi (l994b), Diaman-
topou1os ( 1994), Hair, Anderson, Tatham and Black ( 1995) and Shanna ( 1996).
441
442 CHAPTER 17
example of the effect of consumer attitudes and intentions on coupon usage, taken
from Bagozzi (1994b, pp. 365-368), and introduced in Section 10.3. We partly repeat
the discussion here. Figure 17.1 presents the path diagram (Figure 17.1 is identical
to Figure 10. 7). A path diagram is a graphical representation of a system of relations
among observed and unobserved variables. The path diagram in the figure is based on
the theory of reasoned action. It postulates that intentions influence behavior, while
intentions are themselves affected by attitudes, subjective norms and past behavior. In
addition, behavior is assumed to be directly affected by past behavior and attitudes.
In the path diagram, circles indicate latent variables, in this application attitudes,
subjective norms, intentions and (past) behavior. Square boxes indicate observed or
"manifest" variables. Thus, in Figure 17.1 attitudes are measured by three items.
Subjective norms and intentions are each measured by two, and behavior and past be-
havior each by one. The arrows between the circles indicate hypothesized (assumed)
relations between latent variables, while the arrows between the circles and squares
indicate measurement relations. Arrows pointing to the squares indicate measurement
errors. Two types of observed variables are distinguished: indicators of exogenous and
indicators of endogenous latent variables. The former are denoted by x, the latter by
y. Indicators for endogenous variables in the example are measures of intentions (Yl
and yz) and behavior (y3), while exogenous indicators are the measures of attitudes
(xi, xz and x3) , subjective norms (x4 and xs) and past behavior (x6).
The relationships among the observed (or manifest) variables are reflected in the
covariances among them. Those covariances form the basis for the estimation of a
SPECIAL TOPICS 443
Table 17.1 Covariance matrix for the variables in the coupon behavior study.
YI Y2 Y3 XI xz X3 X4 xs X6
Intentions YI 4.39
Intentions Y2 3.79 4.4I
Behavior Y3 1.94 1.86 2.39
Attitudes XI I.45 1.45 0.99 1.91
Attitudes x2 1.09 1.31 0.84 0.96 1.48
Attitudes x3 1.62 1.70 1.18 1.28 1.22 1.97
Subjective norms x4 0.57 0.90 0.37 0.80 0.80 0.95 1.96
Subjective norms xs 1.06 1.26 0.77 0.93 1.13 1.19 1.09 1.72
Past behavior X6 2.2I 2.50 1.48 1.20 1.01 1.42 0.90 1.13 2.50
structural equation model that describes the relations among variables according to a
set of hypotheses. Importantly, structural equation methods focus on describing the
covariances between the variables. 2 The population covariances among the variables
are described in the matrix :E. We show the sample covariance matrix for the appli-
cation of attitudes and behavior for coupons, from a sample of 85 women, in Table
17 .1. As expected, all covariances are positive in this study.
The structural equation model attempts to reproduce the covariances among the vari-
ables as accurately as possible with a set of parameters, e, where the fundamental
hypothesis is: :E = :E (8). :E (8) is called the implied covariance matrix.
The structural equation model comprises two submodels. The first is called the
measurement model, the second the structural model. The measurement model re-
lates the observed indicators to a set of unobserved, or latent variables. This part of
the model is also called a confirmatory factor model, if it is considered in isolation.
The measurement models for the endogenous and exogenous indicator variables are
formulated in a general form respectively as:
y (17.1)
X (17.2)
where
y = a (p x 1) vector of manifest endogenous variables,
TJ a (m x 1) vector containing the latent endogenous variables
(i.e. the variables that are explained within the model),
Ay = the (p x m) matrix ofloadings, showing which manifest
2. More precisely they constitute the sufficient statistics for the modeL
444 CHAPTER 17
[ ~~
Y3
] = [ ~~: ~ ~~ J+ [ :~ ] .
0 A. 32
] [
83
(17.3)
A similar set of equations determines the measurement model for the three latent
exogenous variables, attitudes (~J), subjective norms (~z), and past behavior (~3),
from their indicators (respectively XJ to x3, X4 to xs, and X6):
X] A.fl 0 0 81
xz A.~l 0 0 8z
A.3J 0 0
U:J+
X3 83 (17.4)
X4 0 A.:4z 0 84
xs 0 A.~z 0 8s
X6 0 0 A.63 86
The error terms in both measurement models may be correlated. The respective co-
variance matrices of the vectors of error terms are denoted by 8 8 and 88 respectively.
<I> denotes the covariance matrix of the latent exogenous variables ~. Covariances of
the endogenous variables are part of the structural component of the model described
below. In order to make the measurement model identified, the latent variables must
be assigned a scale. One way of doing that is to arbitrarily fix one of the loadings
for each latent variable to one, so that the latent variable has the same scale as that
indicator, for example A.f 1 = 1 and A.~2 = 1 in (17.3). Another way to resolve that
identification problem is to standardize the latent variables, which is accomplished
by setting their variances equal to 1, so that <I> becomes a correlation matrix with
diagonal values equal to one. See the structural equation modeling literature for
details.
The structural part of the model captures the relationships between exogenous
and endogenous (latent) variables:
11 = s 11 + r~ + ~· (17.5)
SPECIAL TOPICS 445
0.33 (0.21)
0.90
(0.16)
0.48
0.00
(0.10)
0.45
(0.11)
1.12
(0.20)
0.33
(0.17)
0.0
[ '71
'72
J= [ 0 0
f321 0
J[ 1/1 J+ [
T/2
Yll
Y21
YI2 Y13
0 l'23
J [ ~~~3 ] + [ ~~~2 J. (17.6)
The covariance matrices 8 8 , 8 0 , <1> and llf are specified to be diagonal. The estimation
results are presented in Table 17.2 and in Figure 17.2. Attitudes and past behavior
are found to significantly affect intentions while behavior is influenced by intentions
only. The variance explained in intentions (Ry) is 65 percent and in behavior (R~)
44 percent.
The significance of the overall model fit can be assessed by a x2 -test. One can
also use a number of goodness-of-fit indices 3 •
3. See, f@l"-examp1e, Shanna (1996, pp. 157-162).
446 CHAPTER!?
Ho : :E = :E(8). (17.7)
In this statistic the sample covariance matrix S (see, for example, Table 17.1) is used
as an estimate of :E, and :E(S) = f is the estimate of the covariance matrix :E(8)
obtained from the parameter estimates. Under the null hypothesis, we expect S f =
or (S- f) = 0. In this case, failure to reject the null hypothesis is desired, since
this leads to the conclusion that statistically the hypothesized model fits the data. For
example, a x2 -value of zero results if S- f = 0. Here x2 = 41.21. The relevant
degrees of freedom (df) are l/2(p + q)(p + q + I)- L, where L is the number
of parameters estimated. In this example, p + q is the number of manifest variables
(= 9), and L = 15. Hence df = 30. With 30 degrees of freedom this is not significant
at the I 0-percent level, consistent with the hypothesis that the model fits the data.
The x2-value is sensitive to sample size. For a large sample size, even small
differences in (S- f) will be statistically significant although the differences may
not be practically meaningful. And with a small sample size it is difficult to reject a
null hypothesis that is false. Hence other methods are frequently used to evaluate the
fit of the model to the data. 4 Most of the fit indices are designed to provide a summary
measure of the residual matrix RES= S-f. Three well-known measures are the
Goodness-of-Fit Index (GFI), the GFI Adjusted for degrees of freedom (AGFI) and
the Root Mean Squared Error (RMSE). 5
4. See, for example, Marsh, Balla and McDonald (1988) and Bentler (1990) for a review of these statistics.
5. See also Section 18.5.
SPECIAL TOPICS 447
tr[(~- 1 s -1) 2]
GFI = 1 - ------::=--:------:c- (17.8)
tr(L-1S)2
where
tr = trace of the matrix,
I = the unity matrix.
GFI represents the amount of variances and covariances in S that are predicted by
the model. It is analogous in interpretation to R 2 in multiple regression. 6 If S = ~.
GFI = 1. A rule of thumb is that GFI for good-fitting models should be greater than
0.90. In our example GFI = 0.95. Like R 2 , GFI is affected by degrees of freedom.
The AGFI, analogous to R~ (equation (16.57)), is the GFI adjusted for degrees of
freedom:
Researchers have used a value of AGFI = 0.80 as the cutoff value for good-fitting
models. In our example, AGFI = 0.91.
The root mean squared error is the square root of the average of the squared
residuals:
Intention Yt = 0.80
Y2 = 0.93
Behavior Y3 = 1.00
Attitude Xj = 0.53
X2 = 0.68
XJ = 0.77
Subjective norms X4 = 0.43
X5 = 0.83
Past behavior X6 = 1.00
multiple correlation. A rule of thumb is that commonalities should be at least 0.5, i.e.
that an indicator has at least 50 of its variance in common with its construct. LISREL
provides the commonalities for both endogenous and exogenous indicators (y- and
x-variables): see Table 17.3. Note that X4 fails to satisfy the desired commonality
value. The coefficient of determination for the y- and x-variables indicates how well
the manifest variables as a group serve as measure for the latent endogenous and
exogenous variables. Commonalities and coefficients of determination indicate the fit
of the measurement model.
Shifting attention to the structural model, the squared multiple correlations for
structural equations indicate the amount of variance in each endogenous latent vari-
able accounted for by the exogenous variables in the relevant structural equation.
The total coefficient of determination shows the strength of the relationships for all
structural relationships taken together. 7
The LISREL formulation presented by equations ( 17.1) ( 17 .2) and ( 17 .5) is a very
general one, and encompasses a wide variety of specific models. For example, simul-
taneous equation models arise as a special case when there are no latent variables,
and each manifest variable is set identical to a corresponding latent variable (Section
17.1.4). Other well-known models that arise as special cases are seemingly unrelated
regression models (Section 17 .1.2), errors-in-variables models (Section 17 .1.3 ), and
confirmatory factor analysis (Section 17 .1.5).
Structural equation models are frequently used in marketing. Including the early
work (Bagozzi, 1977, 1980, Fomell, Larcker, 1981 ), structural equation models have
been used, for example:
7. See for these and other statistics Diamantopou1os (1994) and Sharma (1996. Chapter 6).
SPECIAL TOPICS 449
In estimating sales or market share models, it is possible that the errors of the equa-
tions for the different brands are correlated. For example, the model may overestimate
the market share of some brands and underestimate the market shares for other brands.
As a result the residuals tend to be negatively correlated across brands. These sys-
tems of relations are known as seemingly unrelated regressions. We introduced this
situation in Section 16.3, see equation ( 16.11 Oa ), and we showed how models with
contemporaneous error-term correlations can be estimated by GLS. Seemingly un-
related regressions can also be conveniently specified and estimated in a LISREL
framework. For this purpose we can specify the structural equations as:
y (17.11)
X (17.12)
17 (17.13)
with a non-diagonal covariance matrix \II of the errors t.
17.1.3 ERRORS-IN-VARIABLES MODELS
It is well known that measurement errors in the predictor variables in regression equa-
tions can affect the parameter estimates. However, the consequences of such errors
are hard to predict: parameter estimates may stay the same, increase or decrease in
value. 8 LISREL provides a convenient framework for dealing with errors in variables.
Suppose we have measured a criterion variable y, and a set of predictor variables
x which are potentially measured with error. Then the equations are:
y (17.14)
X (17.15)
17 (17.16)
Covariance in the measurement errors can be captured by specifying a non-diagonal
covariance matrix of 8: 85. However, this model is not identified in general for
Ax = I, i.e. the situation where there is one indicator per latent variable (or, a mul-
tivariate regression model with errors in the predictors). One solution is to have
multiple measures for the same latent variable, ~; 9 other solutions arise in certain
overidentified models. 10
8. See, for example, Vanden Abeele (1975), Ketellapper (1981, 1982), Amemiya (1985, Section 6.6).
9. See, for an application in marketing, Plat ( 1988, Chapter 3).
10. The reader is referred to Bollen (1989) for more details.
450 CHAPTER 17
TJ = BTJ + rs + ~ (17.17)
where the interpretation of the parameter matrices B and r is as above. The measure-
ment model for simultaneous equations with observed variables is: y = 1J and x = s,
i.e. each latent variable corresponds to one indicator so that (17.17) reduces to:
TJ=By+rx+~. (17.18)
Again, the error terms may have a covariance matrix denoted by Ill, which allows
them to be correlated. Thus, simultaneous equations can be seen as a special case,
which can be estimated in the LISREL framework. For example, consider a model
with two-way causality between sales (y1) and advertising (y2), both assumed to
be measured without error. Price (x1), in addition, influences sales. The structural
equations (16.120)-(16.123) can be rewritten in the framework (17.18) as:
(17.19)
can be assessed by examining the fit of model (17.2) with n = 1latent construct. The
reliability of a latent construct can be assessed as:
(17.20)
which reduces to the Cronbach coefficient a if all loadings and error variances are
equal.
17.2.1 INTRODUCTION
Heterogeneity has become a very important topic in the marketing literature. Mar-
keting researchers have recognized the importance of the differences in behavior
among consumers, and have developed models to accommodate such heterogeneity in
marketing models. Marketing decisions depend critically on a correct understanding
of heterogeneity. This heterogeneity also needs to be considered even if the model
user cares only about average behavior. For example models of aggregate data may
be sensitive to a variety of aggregation biases (see Section 14.1.2). Important insights
into heterogeneity are obtained from the application of mixture models. Mixture mod-
els are models that assume the existence of a number of (unobserved) heterogeneous
groups of individuals in a population. The theory of mixture models connects ele-
gantly to market segmentation theory and it presents a statistical approach to a wide
variety of segmentation problems.
In this section we review the application of mixture models to market segmenta-
tion problems. A major advance in segmentation methodology is due to the develop-
ment of mixture models and mixture regression models (Wedel and Kamakura, 1998).
The development of mixture models dates back to the nineteenth century (Newcomb,
1886). In finite mixture models, it is assumed that the observations of a sample arise
from two or more unobserved segments, of unknown proportions, that are mixed. The
purpose is to "unmix" the sample and to identify the underlying segments. Contrary
to most of the traditional clustering procedures which merely present convenient
heuristics for deriving data-based segments, mixture distributions present a model-
based approach to segmentation. They allow for hypothesis testing and estimation
within the framework of standard statistical theory.
The mixture model approach to segmentation presents an extremely flexible class
of clustering algorithms that can be tailored to a very wide range of substantive
marketing problems. Mixture models are statistical models which involve a specific
form of the distribution function of the observations in each of the underlying popu-
lations (which is to be specified). The distribution function is used to describe the
452 CHAPTER 17
In order to formulate the finite mixture model, assume that a sample of N subjects
is drawn. For each subject, J variables y; = (Yii, i = 1, ... , N, j = 1, ... , J) are
measured. The subjects are assumed to arise from a population which is a mixture
of S unobserved segments, in (unknown) proportions nt, ... , ns. It is not known in
advance to which segment a particular subject belongs. The probabilities 1l's satisfy
the following constraints:
s
L1l's=1, 1l's?:0, s=1, ... ,S. (17.21)
s=!
Given that Yij comes from class s, the distribution function of the vector of measure-
ments y; is represented by the general form fs(Yi I (}8 ). Here (Js denotes the vector
of all unknown parameters for class s. For example, in the case that the Yij within
each segment are independent normally distributed, (}8 contains the means, f.1, js, and
aJ
variances, 8 , of the normal distribution within each of the S segments. The basic
idea behind mixture distributions is that the unconditional distribution is obtained
from the conditional distributions as:
s
f(y;, ¢) = Lnsfs(Yi I (Js) (17.22)
s=l
12. This discussion is based on Titterington, Smith, and Makov (1985) and McLachlan and Basford (1988).
SPECIAL TOPICS 453
where cjJ = (n, (}) denotes all parameters of the model. This can easily be derived
from the basic principles of probability theory: the unconditional probability is equal
to the product of the conditional probability given s, times the probability of s, and
this expression summed over all values of s.
The conditional density function, fs(Yi I es). can take many forms including the
normal, Poisson and binomial distribution functions, as well as other well-known
distribution functions such as the negative binomial, exponential, gamma, and inverse
Gaussian that can be used to describe market phenomena. All of these more com-
monlyused distributions present specific members of the so-called exponential family
of distributions. This family is a general family of distributions that encompasses both
discrete and continuous distributions. The exponential family is a very useful class of
distributions. The common properties of the distributions in this class enables them
to be studied simultaneously, rather than as a collection of unrelated cases. These
distributions are characterized by their means p, js. and possibly so-called dispersion
parameters A.js· In mixtures these parameters are typically assumed to be constant
over observations within each segment s.
Often, the J repeated measurements on each subject are assumed to be indepen-
dent. This occurs for example if a single subject provides evaluations of several (J)
products or brands j = I, ... , J. This implies that the joint distribution function for
the J observations factors into the product of the marginal distributions:
J
fs(Y; I es) = fl fs(Yij I esj). (17.23)
j=l
If, given the knowledge of the segments, the observations cannot be assumed to be
independent, then one of the members of the multivariate exponential family may be
appropriate. The two most important and most frequently used distributions in this
family are the multinomial distribution, and the multivariate normal distribution. In
the latter case, the distribution of y; takes the well-known multivariate normal form,
with P..s the ( J x I) vector of expectations, and I:s the ( J x J) covariance matrix of
the vector y;, given segment s.
Generalized linear models (Nelder and Wedderburn, 1972) are regression models
in which the criterion variable is assumed to be distributed according to one of the
members of the exponential family. Generalized linear models deal with continuous
variables that can be specified to follow a normal, gamma, or exponential distribu-
tion; for discrete variables the binomial, multinomial, Poisson or negative binomial
distributions can be utilized. The expectation of the criterion variable is modeled as
a function of a set of explanatory variables as in standard multiple regression models
(which are a special case of generalized linear models). However, the estimation
of a single aggregate regression equation across all consumers in a sample may be
inadequate and potentially misleading if the consumers belong to a number of un-
known classes (segments) in which the regression parameters differ. If the behavior
of different consumers is studied, it is not difficult to find reasons for the existence
of such heterogeneous classes. It is therefore no surprise that the application of the
mixture regression approach has proven to be of great use in marketing.
We discuss the mixture regression framework by extending the unconditional
mixture approach described above. We assume that the vector of observations (on
the criterion variable) of subject i, Yi, arises from a population which is a mixture
of S unknown segments in proportions .1r1 , ••• , .1rs. The distribution of y;, given that
Yi comes from segment s, fs (y; I Bs) is assumed to be one of the distributions in the
exponential family, or the multivariate exponential family. In addition to the criterion
variable, a set of L non-stochastic explanatory variables X 1, ... , XL, (Xe = (xije),
e
j = 1, ... , J, = 1, ... , L) is assumed to be available. A major difference from the
mixture models discussed above is that the means of the observations in each class
are to be predicted from a set of predictor variables. To this end, the mean of the
distribution, IJ-isj, is written as:
(17.24)
where g(.) is some function, called a link-function, and 1'/isj is called the linear
predictor. Convenient link-functions, called canonical links, are respectively the iden-
tity, log, logit, inverse and squared inverse functions for the normal, Poisson, bino-
mial, gamma and inverse Gaussian distributions, respectively. The linear predictor for
individual i in segments is a linear combination of the L explanatory variables:
L
1'/isj = L Xtij f3ts (17.25)
l=!
where
f3es = regression parameters to be estimated for each segment.
Thus for each segment a generalized linear model is formulated consisting of a spec-
ification of the distribution of the criterion variable within the exponential family,
a linear predictor 1'/isj, and a function, g(.), which links the linear predictor to the
expectation of the distribution. For example, for the normal distribution the canonical
link is the identity-link: 1'/isj = IJ-isj, so that by combining (17.24) and (17.25) a
SPECIAL TOPICS 455
simple linear regression model for each segment arises. Note that the unconditional
mixture of the previous section arises as a special case - the matrix X consists of one
column with ones so that only an intercept f3s is estimated for each segment.
17.2.4 APPLICATION
Before presenting the results of the mixture regression analysis, we first report the
results of a standard regression analysis. Table 17.4 presents the results of the OLS
regression of overall performance on the eight performance factors. The table reveals
that the factors "identifying new prospects" and "new product testing" were signifi-
cantly related to overall trade show performance. The aggregate regression explained
37 percent of the variance in overall performance.
A mixture regression model applied to the same data reveals there exist two classes.
The effects of the performance factors in the two classes are markedly different from
those at the aggregate level, as shown in Table 17.4. Managers in segment 1 primarily
evaluate trade shows in terms of non-selling factors, including "servicing current
customers", and "enhancing corporate moral". Managers in segment 2 evaluate trade
shows primarily on selling factors, including "identifying new prospects", "introduc-
ing new products", "selling at the shows", and "new product testing". Neither of the
two segments considers "gathering competitive information" important. The percent-
ages of explained variance in overall trade show performance in segments 1 and 2
were respectively 73 and 76 percent, a substantial improvement over the OLS results.
The analysis shows that neglecting existing segments may lead to low explanatory
power of models, to biased parameter estimates and incorrect managerial action.
456 CHAPTER 17
In mixture models and mixture regression models, the segments identified are often
described by background variables or concomitant variables (such as demographics)
to obtain insights into the composition of the segments. Such profiling of segments
is typically performed on the basis of a-posteriori class membership probabilities.
These a-posteriori memberships, lris, provide the probability that a particular sub-
ject i belongs to a certain class s (and have the properties that 0 :S lris :S I, and
L;= 1 lris = 1). The classes are frequently profiled in a second step of the analyses:
a logit transformation of the posterior membership probabilities, log(nis / (1 - 7l'is)),
is regressed on designated external variables. The coefficient of a specific concomi-
tant variable for a certain class represents the effect of that variable on the relative
probabilities of subjects belonging to that class. However, the two-step procedure has
several disadvantages. First, the legit-regression is performed independently from the
estimation of the mixture model, and optimizes a different criterion - the sum of
squared errors in the posterior probabilities rather than the likelihood. As a result, the
classes derived in the first stage do not possess an "optimal" structure with respect to
their profile on the concomitant variables. Secondly, this procedure does not take into
account the estimation error of the posterior probabilities. Therefore, several authors
have proposed models that simultaneously profile the derived segments with descrip-
tor variables (Kamakura, Wedel, Agrawal, 1984, Wedel and Kamakura, 1998). These
models are based on an earlier concomitant variable latent class model proposed by
Dayton and MacReady (1988).
To develop the concomitant variable mixture regression model, let C = I, ... , L
SPECIAL TOPICS 457
index concomitant variables, and zu be the value of the l-th concomitant variable for
subject i, where the matrix Z = ((zu)). We again assume that the Yi are distributed
according to some member of the exponential family, conditional upon unobserved
classes as above. The starting point for the development of the model is the general
mixture regression model defined by (17.22). The unconditional distribution for the
concomitant variable mixture is formulated as:
s
f(yj, ¢) = L.7fs[Z/s(Yi I fJs, As). (17.26)
s=l
Note that equation (17.26) is similar to equation (17.22), but the prior probabilities
of class membership, .7fs , have been replaced by .7fs[Z· This is the core of the con-
comitant variable model approach: the prior probabilities of segment membership are
explicitly reparameterized as functions of the concomitant variables. For this purpose
the logistic formulation is often used:
For longitudinal data, the assumptions underlying the mixture regression models de-
scribed above may not be valid. In particular, the behavior of a specific subject and/or
its membership to underlying segments may change over time. Mixture regression
models applied to longitudinal data should potentially reflect such dynamics. There
are several ways of dealing with time trends in mixture regression models. One ap-
proach, latent Markov mixture models, is described in some detail. For more details
we refer to Wedel and Kamakura ( 1998, Chapter 7).
For ease of exposition, two time periods t = 1 and t = 2 are assumed. The seg-
ments at time t = 1 are denoted by s = 1, ... , S, and at time t = 2 by u = 1, ... , U.
The criterion variable for subject i is y; = ((yij 1)), and the predictor variables Xi =
458 CHAPTER 17
{(xij1)) are defined accordingly. The extension to more than two periods is straight-
forward.
The expected values of the criterion variable within classes is assumed to be
provided by equations (17.24) and (17.25), i.e. we start from the mixture regression
model. In the latent Markov model, subjects are assigned a simultaneous prior pro-
bability ofbelonging to segments at timet= 1 and to segment u at timet= 2: 'lrsu·
This simultaneous probability of segments s and u is specified as the product of the
marginal probability ofbeing in segments at timet, and the conditional probability
of being in segment u at time t = 2, given segments at time t = 1:
Equation (17.28) presents the submodel of the latent Markov mixture regression
model. The unconditional distribution of the data, analogous to equations (17.22)
and (17.26) is:
s u
f(y;,¢) = LL'lrsfs(Yi I fJs,As)'lruisfu(Yi I fJu,Au). (17.29)
s=lu=l
The latent Markov mixture regression model can be used to investigate the effects
of a set of predictor variables X on a criterion variable y in a number of unobserved
classes, and at the same time the transitions of subjects among these segments over
time.
17.3.1 INTRODUCTION
Marketing data for a brand often include measures on for example sales, repeated
typically (at equally spaced intervals) over time. In Chapter 6 on marketing dynamics
we dealt with lagged effects for predictor variables and distributed lags on the crite-
rion variable, y, to capture such market dynamics. Time-series or ARIMA models are
uniquely suited to capture the time dependence of a criterion variable. These models
can describe patterns in data on a criterion variable as a function of its own past.
Although the approach lacks foundations in marketing theory, it is attractive in the
sense that the models can separate short-term from long-term marketing effects (when
marketing variables are included). 14
The shape of the market response function in time-series analyses is determined
by a number of aspects, including prior theoretical knowledge, the objective of the
model, and empirical analysis of the data. Most often, ARIMA models are identified
from the data. Important statistics in building ARIMA models are the autocorrelation
function(ACF) and partial autocorrelation function (PACF). To illustrate, let y1 be the
14. Comprehensive treatments of time series models are Pankratz ( 1991 ), Hanssens, Parsons and Schultz ( 1990,
Chapter4), Hanssens, Parsons (1993), Box, Jenkins and Reinsel (1994), Franses (1996, 1998).
SPECIAL TOPICS 459
sales of a brand in period t. The (ACF) at lag k is simply the correlation p (y1 , Yr-k).
The (PACF) is the coefficient of Yt-k in the regression of y 1 on all lagged values
of y up to Yt-k· In general, the type of time series model to be fitted to the data is
identified from plots of the ACF and PACF against the lag k. Specific types of models
correspond to specific autocorrelation functions, as explained below.
Let, y1 be the sales of a brand in period t. A common and fairly simple way to describe
fluctuations in sales is with a first-order autoregressive process. In this process it is
assumed that sales at t - 1 affect sales at t:
This model states that sales in period t are determined by sales in the previous period,
t - 1. Depending on the value of ¢1 we distinguish three situations. If I ¢1 I< 0, the
effect of past sales diminishes. If I ¢1 I= 1, the effect of sales in Yt-1 has a perma-
nent effect on sales. Sales will not revert to a historical level but will evolve. In the
case where I ¢1 I> 1, past sales become increasingly important; which appears to be
unrealistic in marketing (Dekimpe, Hanssens, 1995a, p. 5). The process is stationary
if I ¢1 I< 1. In a stationary process the mean, variance and autocorrelation are con-
stant in time. This model is indicated as AR(1). It can be identified from data using
the AutoCorrelation Function (ACF) and Partial AutoCorrelation Function (PACF)
calculated from sample data. The ACF and PACF are statistics that can be used to
identify the type of time series for a given data set. The ACF and PACF of an AR( 1)
process are shown in Figure 17.3. Typically, the ACF decays exponentially and the
PACF shows a positive "spike" at lag 1 and equals zero thereafter, if ¢ 1 is positive.
The ACF shows a damped wavelike pattern and the PACF a negative spike at 1, if ¢1
is negative.
460 CHAPTER 17
Lag Lag
-1 2 4 5 6 7 8 -1 2 3 4 5 6 7 8
Lag Lag
-1 2 3 4 5 6 7 8 -1 2 3 4 5 6 7 8
The order (p) of an AR(p) process is the highest lag of Yt that appears in the model.
The general p-order AR process is written as:
f/Jp(B)yt = ~ + St (17.31)
where
f/Jp(B) = (l - ¢1 - ¢2B 2 - ... - f/JpBP),
B = the backshift operator defined by Bky1 = Yr-k·
ForanAR(2) process ¢2(B) = (l-¢1B-¢2B 2) so that (l-¢1B-¢2B 2)y1 = ~+sr
which leads to:
Yt = ~ + ¢1Yr-1 + ¢2Yr-2 +Sr. (17.32)
SPECIAL TOPICS 461
Lag Lag
-I 2 3 4 5 6 7 8 -I 2 3 4 5 6 7 8
ACF PACF
Lag Lag
-I 2 3 4 5 6 7 8 -I 2 3 4 5 6 7 8
(17.33)
This model is indicated as MA(l). Note that the past random shock does not come
from Yt-1• as in the AR(l) model, but it stems from the random component of Yt-1·
TheACFandPACFforthe MA{l) model are depicted in Figure 17.4. Here, theACF
shows a spike at lag 1' which is negative if el > 0 and positive if e, < 0, while the
PACF shows exponential decay in the former case, or a damped wavelike pattern in
the latter.
appears in the model. Where I 81 I< 1 the impact of past shocks diminishes, whereas
I 81 I= 1 implies that each random shock has a permanent effect on sales. The general
q-order MA process is written as:
(17.35)
The AR and MA processes can be combined into a single model to reflect the idea
that both past sales and past random shocks affect y1 • For example, the ARMA(l,l)
process is:
(17.36)
The ACF and PACF for an ARMA(l,l) model are depicted in Figure 17.5. Here,
both the ACF and the PACF show exponential decay, or a damped wavelike pattern.
The identification of mixed ARMA models from the ACF and PACF functions is not
always straightforward, but in these cases an Extended ACF (EACF) is more useful
in identifying the orders of mixed ARMA models. 15
The mixed processes need to satisfy both stationarity and invertibility conditions.
The orders (p, q) of an ARMA process are the highest lags of y1 and e1 that appear in
the model. For example, for an ARMA(l,l) process, p = 1 and q = l. The general
ARMA(p, q) process is formulated as follows:
with c/Jp(B) and Oq (B)e 1 as defined above. As an example, for an ARMA(2,2) process,
c/J2(B) = (1- c/J1B- c/J2B 2) and02(B) = (1- 01B- 02B 2) so that:
(17.38)
ACF PACF
Lag Lag
-1 2 3 4 6 7 8 -1 2 3 4 6 7 8
have published results from such models, on a wide variety of durable (trucks, air-
planes, furniture), nondurable food (cereal, catsup, beverages) and nonfood (deter-
gents, toothpaste, cleaning aids) product categories and services (holidays, passenger
transit, advertising).
We now define stationarity and invertibility formally. Stationarity requires that the
roots of what are called the characteristic equations, r/Jp(B) = 0, "lie outside the unit
circle". Similarly, invertability requires that the roots of 8q(B) "lie outside the unit
circle". For the AR(2) process this implies solving ( 1 -¢I B- ¢2B 2) = 0 and for the
MA(2) process solving (1- BIB- 82B 2 ) = 0. In practice, a numerical routine can
be used to solve these characteristic equations (if p > 2 or q > 2).
For example, for an AR(l) process p = 1. Then, (1- ¢ 1B) = 0 is the charac-
teristic equation. The root equals 1!¢I, which is greater than one in absolute value
if ¢ 1 < 1. Thus, the null-hypothesis for testing nonstationarity is that ¢I = 1.I 6 This
test is called a unit root test. We redefine the AR(1)-model (17.30) as follows:
The reparameterized model (17.39) applied to data yields at-statistic for 9, which can
be used to test Ho. This test is known as the Dickey-Fuller test. However, the t-statistic
I6. Since most marketing series are positively correlated, we do not test ¢I = -I.
464 CHAPTER 17
that is obtained cannot be evaluated with the regular tables of the t -distribution. In-
stead, special tables need to be used. The generalization of the Dickey-Fuller test to
an AR(p) process yields the Augmented Dickey-Fuller test. This test is based on a
reformulation of the AR(p) process as:
The Augmented Dickey-Fuller (ADF) test can be used to test the null hypothesis y =
0. A large number oflagged first differences should be included in the ADF regression
to ensure that the error is approximately white noise. In addition, depending on the
assumptions of the underlying process, the test may be performed with or without
1-L in the model. Again, special tables are needed to assess the significance of the
t -statistic.
An important application of unit root tests in marketing is provided by Dekimpe
and Hanssens ( 1995a). They use the tests to determine whether sales are trending over
time. If this is the case, they examine whether the trend can be related to marketing
activity which implies multivariate persistence. Multivariate persistence is especially
relevant for the design of marketing strategies that deliver sustainable, long-term
marketing advantage.
In a second study, Dekimpe and Hanssens (1995b) apply unit root tests and
other time-series methods to identify empirical generalizations about market evol-
ution. Based on data from 400 published studies they conclude that evolution is
the dominant characteristic in marketing series in general as well as in series with
market performance measures as criterion variables. Although in many cases the
necessary condition exists for making a long-run impact, market performance does
not evolve over time in about 40 percent of all cases. The meta-analysis for perfor-
mance measures is given in Table 17.5. The models are classified into sales models
and market share models. The results show that evolution occurs in a majority of the
brand sales models, and that a vast majority of the market share time-series models
is stationary. This is consistent with arguments of Bass, Pilon ( 1980) and Ehrenberg
( 1994) that many markets are in a long-run equilibrium. The relative position of the
brands is only temporarily affected by marketing activities. 17
17. Other studies that assess the long-term impact of marketing instruments are, for example, Mela, Gupta,
Lehmann (1997), Mela, Gupta, Jedidi (1998), Jedidi, Mela, Gupta (1999).
SPECIAL TOPICS 465
The ARMA processes described above are processes for series with a stationary
mean. In these processes the mean value of y1 does not change over time. However,
in marketing we often see that sales variables evolve (see Table 17.5) which implies
nonstationarity. Nonstationary series can be formulated as ARMA processes for dif-
ferenced series, Zt = y 1 - Yt-1· The differencing operation removes a trend from the
data. The corresponding model is called an ARIMA (Integrated ARMA) model.
As an example, consider an ARIMA(1,1,1) model:
(17.42)
Many series of sales data in marketing display seasonal patterns, resulting from varia-
tion in weather and other factors. As a result sales fluctuate systematically around the
mean level, such that some observations are expected to have values above and other
observations below the mean. For example, sales of ice cream in Europe tend to be
highest in the spring and summer and lowest in winter periods. The seasonal effects
can be of the AR, the MA or the Integrated types, depending on whether sales levels
or random shocks affect future sales, and on whether nonstationary seasonal patterns
exist. Therefore a seasonal model may apply, with orders P,D, and Q respectively
for the AR, I and MA components, denoted by ARIMA(P, D, Q) 8 , with s the lag of
the seasonal terms.
To illustrate, suppose there exists a seasonal pattern in monthly data, such that
any month's value contains a component that resembles the previous year's value in
the same month. Then a purely seasonal ARIMA(1,1,1)t2 model is written as:
In this model sales in each period is affected by advertising in that period (x 1), and
by advertising in the previous period (x1-J). The general dynamic regression model
formulation for one variable is:
(17.50)
where
1-L = a constant,
Vk(B) = vo + VJ B + v2B 2 + ... + VkBk, the transfer function,
B = the backshift operator, and
k = the order of the transfer function, which is to be determined.
The transfer function, is also called the impulse response function, and the v-co-
efficients are called the impulse response weights. In the example, if sales do not
react to advertising in period t, but only to lagged advertising, vo = 0. In that case,
the model is said to have a "dead time" of one. In general, the dead time is the number
of consecutive v's equal to zero, starting with vo.
Before we discuss how the order of the impulse response function can be de-
termined, we consider a well-known special case: the K.oyck model. In this model,
the impulse response weights are defined by Vi = a Vi -1, for i = 1, ... , oo. Thus, the
response is a constant fraction of the response in the previous time period, and the
weights decay exponentially. It can be shown that Koyck's model is equivalent to:
(17.51)
or an AR( 1) model with one explanatory variable of lag zero. Now, bringing the term
involving the lagged criterion variable to the left-hand side of (17.51) results in:
(1 - aB)Yt = VOXt + Et (17.52)
which can be rewritten as:
vo
Yt = (1- a B) (Xt + Et). (17.53)
468 CHAPTER17
This formulation is called the (rational) polynomial form of the K.oyck model (see
Section 6.1 ). In general, rational polynomial distributed lag models comprise a family
of models, that are represented by substituting the following equation in (17.50):
Vk,t(B) (17.54)
where
Wk(B) = wo + w1 B + w2 B 2 + ... + Wk Bk, which contain
the direct effects of changes in x on y over time,
at(B) = ao + a1 B + a2B 2 + ... + aeBe, which show
the gradual adjustment of x to y over time, and
Bd = the dead time (i.e. d = 0 corresponds
to dead time of B 0 = I).
For the identification of dynamic regression models from data, two problems arise.
The first is to find a parsimonious expression for the polynomial vu(B), and the
second is to find an expression for the time structure of the error term, in the form
of an ARIMA model. An important tool in the identification of transfer functions
is the cross-correlation function (CCF), which is the correlation between x and y
at lag k: p(y1, Xr-k). The CCF extends the ACF for the situation of two or more
series (those of x and of y ), with similar interpretation: spikes denote MA parameters
(in the numerator in (17.54)), and decaying patterns indicate AR parameters (in the
denominator in (17.54)).
We note that the simultaneous identification of the transfer function and the ARI-
MA structure of the error in (17.50) is much more complex than it is for a single
ARIMA process. An example that occurs frequently in marketing is one where the
original sales series shows a seasonal pattern, for which an ARIMA{l,O,l){l,0,0)12
model is indicated. However, if temperature is included as an explanatory variable
in the model, an ARIMA(l,l,O) may suffice. Thus, the identification of the ARIMA
error structure in (17.50) depends on the exogenous variables included in the model.
Procedures that have been proposed for that purpose are the LTF (linear transfer
function) method and the double prewhitening method. The core of these methods
involves fitting univariate time series to the individual series, after which the es-
timated white noise residuals are used for multivariate analyses. This is called the
prewhitening ofvariables. 18
An alternative strategy that is useful especially with several input variables is the
ARMAX procedure, which is an ARMA model for an endogenous variable with
multiple exogenous variables. Franses ( 1991) applied this procedure to an analysis of
18. We refer to Hanssens, Parsons and Schultz (1990), Pankratz (1991) or Box, Jenkins and Reinsel (1994) for
a description of these methods. Applications can be foWld in Bass and Pilon (1980), Leone (1983), and Doyle
and SaWlders ( 1985).
SPECIAL TOPICS 469
the primary demand for beer in the Netherlands. Based on 42 bimonthly observations
from 1978 to 1984, using ACF's and model tests, Franses obtained the following
model:
In this model, 81 to 86 are bimonthly seasonal dummies, Pt is the price, .6. 1 is a first-
order differencing operator, and Pt+I is a price expectation variable that assumes
perfect foresight. The model contains a lagged endogenous variable, a moving av-
erage component, seasonal effects (modeled through dummies rather then through
differencing) and current price and future price effects. Substantively Franses con-
cluded that tax changes may be effective if one wants to change the primary demand
for beer, given the strong price effect. The positive effect of future prices suggests
some forward buying by consumers. Advertising expenditures did not significantly
influence the primary demand for beer.
In the transfer functions we consider the relations between criterion (y) and predictor
variables (x ). 19 The variables y and x may both be integrated of order dy and dx
respectively, where dy "I 0, dx "I 0. We discuss the relevance of this below.
In the regression model:
Yt = f3xr + 8t (17.56)
there is a presumption that the e1 are white noise series. However, this is unlikely to
be true if dy "I 0 and/or dx "I 0. Generally, if two series are integrated of different
order, i.e. dy :fo dx, linear combinations of them are integrated to the higher of the
two orders (max(dx. dy)). Ify, andx1 are integrated to the same order(dy = dx) then
it is possible that there is a {3 such that:
is I (0) (i.e. integrated to the order zero=white noise). Two series that satisfy this
requirement are said to be cointegrated. Cointegration is a critical requirement for
the modeling of time-series data to be meaningful in the sense that long-term effects
can be predicted and long-term relations can be determined. In Section 17.3.5 we
discussed the studies ofDekimpe and Hanssens (1995a, 1995b) in which long-term
relations were established.
A useful tool to investigate whether the x and y series are cointegrated is the
Engle-Granger cointegration test. A cointegration test involves the estimation of a
cointegration regression between x and y and the application of a unit root test to
the residuals from that regression. For further details see Engle and Granger (1987).
Franses ( 1994) applied cointegration analysis to the analysis of new car sales, formu-
lated in the framework of a Gompertz model for the growth process of sales.
19. See also, for example, Greene (1997, p. 852).
470 CHAPTER 17
Apart from traditional marketing variables such as price and advertising, we can ac-
commodate discrete events in models of sales. Examples include a new government
regulation, the introduction of a competitive brand, a catastrophic event such as pois-
oning or disease relevant to food products, and so on. Intervention analysis extends
the transfer function approach described above for the estimation of the impact of
such events. 20 Intervention analysis in fact extends dummy-variable regression to a
dynamic context. The interventions may have two different effects: a pulse effect,
which is a temporary effect that disappears (gradually), or a step effect that is per-
manent once it has occurred. A strike could have a pulse effect on the production or
distribution of a product. The introduction of a new brand may permanently change
sales of an existing brand. The dummy variables for these two types of effects can be
represented as follows:
• pulse effect: x{' = 1 in the time periods of the intervention (t = t'), and xf = 0
in all other periods (t =I= t');
• step effect: xf = 1 in the time periods in which the event occurs and all sub-
sequent time periods (t :::: t'), and xf = 0 at all time periods before the event
(t < t'),
For intervention analysis the model defined by (17.50) and (17.54) applies, with the
x-variable defined as above. The form of(17.54) determines the nature of the pulse
interventions. Ifvp(B)x{' = ~x{' then y1 shows a single spike at timet. In Figure
17.6 we show a number of pulse interventions. 21 In Figure 17 .6a, y1 has a stationary
mean except for the effect of the pulse intervention at t = t'. During t = t', y1 shifts
upward. Immediately after t = t', the series returns to its previous level. For a series
with a nonstationary mean, a pulse intervention might appear as shown in Figure
17.6b. After t = t', in which there is a downward shift, the series returns to the level
determined by its nonstationary character. The transfer function part is again woxf' as
in Figure 17.6a.
Figure 17.6c shows a pulse intervention at t = t' and t = t' + 1, i.e., a multiperiod
temporary response. The transfer function part in t = t' is woxf and in t = t' + 1 it is
w1 Bx{'. Hence v(B)x{' = (wo+wl B)x{'. Notice that these effects are not cumulative;
the response of woxf after t = t' is zero, and the "secondary" response of w1 Bx{' is
zero aftert = t' + 1. Ifv(B)x{' = ~x{', the effect is a spike att = t', which decays
subsequently (if I 01 I< 1). The series in Figure 17 .6d shows a continuing dynamic
response following period t'. Each response after t = t' is a constant fraction 01 of the
response during the previous period. The specification of v(B)x{' is a Koyck model;
woxf is the initial response oft = t' and 01 is the retention rate (see equation (6.11 )).
t
t' Time t' Time
p
(c) v(B)xf = (wo + w1 B)xf (d) v(B)xf = (I~~ B) I a I< 1
t t t
t't' + 1 Time t' Time
In Figure 17.7 we show some step interventions, i.e. permanent changes on y1 • Figure
17.7a shows a permanent effect that is immediate. Here v(B)xf = woxf, xf = 0 for
t < t' and xf = 1 for t ?:: t'. Figure 17. 7b shows that a step intervention can be
semi-permanent. Here xf = 1 for t = t', t' + 1, t' + 2. Figure 17. 7c illustrates a step
intervention for a series with a nonstationary mean. A step intervention with dynamic
effects is shown in Figure 17.7d. The transfer function of 17.7d is (wo + w1B)xf,
where woxf captures the step during t' - t' + 1.
t t
t' Time t 1 t 1 +1 Time
research agency and are shown in periods of four weeks. The total length of the series
is 52 (four-weekly) periods (4 years).
For this problem, no data on marketing variables were available, and the model was
calibrated on market shares (y1 ) only. The interventions occurred in periods 28 and
29. First, an ARIMA model was identified on the series up to the first intervention
(t :=:: 27). The ACF and PACF suggested an AR( 1) model. There is no evidence of a
seasonal pattern. The Dickey-Fuller unit root test indicated that the series is stationary.
The estimates of the AR( 1) model are presented below:
In this model, the AR( 1) coefficient is highly significant. Subsequently, the interven-
tions are modeled. Step functions were assumed for the two interventions based on
an inspection of the data. Because the two interventions occur in two subsequent time
periods, only a simple transfer function was estimated. The two intervention dummies
are X!r = 0 fort= I, ... , 27 and X!r = 1 fort > 27, and xz1 = 0 fort= 1, ... , 28
and xz1 = 1 fort > 28, respectively. The estimated intervention model is:
The R 2 of this model is 92.4 percent. The estimated standard error of each of the
intervention effects is about 1.5, indicating that both effects are highly significant.
SPECIAL TOPICS 473
Market share
35
30
25
20
15
10
Time
a * Period between t = 28 and t = 29.
The estimated standard error of the AR term is 0.1, so that this term is significant
(as in 17.58). The model appears to fit the data quite well. The residuals do not
show any remaining autocorrelation and are approximately normal. The step function
specification of the interventions thus seems appropriate. The analysis suggests that
both interventions have affected the share of the brand permanently over the time
period considered in the study and it quantifies the magnitudes of the effects.
In Section 7.2.4 we discussed why it is useful to adapt models. One possible adap-
tation is through the parameters. Model parameters can vary, even if the structure of
the model stays the same. We refer to varying parameter models if the form of the
model and the inclusion of variables are held constant but parameters vary over time
and/or over cross-sections. In this section we briefly review models and techniques
for parameter variation, 24 and discuss a specific application.
A simple approach is to separate time-series data into several periods and to estimate
fixed parameters for a given model (piecewise regressioni 5 to each period. This ap-
proach is attractive because of its simplicity. However, it is difficult to justify discrete
changes in parameters. For example, if a new set of parameters is estimated each
23. This section is based on Foekens, Leeflang, Wittink (1999).
24. See also Judge, Griffiths, Hill, Liitkepohl and Lee ( 1985, Chapter 19).
25. See, e.g. McGee, Carleton (1970), Parsons (1975) and for an application Leeflang, Plat (1984).
474 CHAPTER 17
calendar year, what is the argument in favor of fixed changes on January 1 and at the
same time no changes at all during the rest of the year. It may also be impossible to
anticipate in which direction the parameters will change prior to the beginning of a
new calendar year. This would decrease the model's usefulness for decision-making
purposes.
A moving window regression 26 allows the parameters to change slowly with each
new data point. For example, a simple approach is to re-estimate all parameters by
deleting the oldest observation and adding the newest to the sample. This method
has the advantage that parameters change continuously, rather than discretely at ar-
bitrary times. However, estimation is more time consuming although it can be done
efficiently through recursive estimation (e.g. Reinmuth and Wittink, 1974). And sys-
tematic patterns may emerge which can suggest the nature of true dynamics.
Explicit approaches include those that model the parameters as a function of time27
or model the parameters in a stochastic fashion. 28 A example is a two-stage sales-
forecasting procedure developed by Smith, Mcintyre and Achabal (1994). This pro-
cedure uses regression analysis in the first stage with data pooled across (similar)
products. The second stage of the procedure updates selected parameters for each
product on the basis of newly observed sales data.
Of greater substantive use are models in which marketing variables are allowed to
interact. Gatignon and Hanssens (1987) define a class of models in which some vari-
ables explain variation in parameters associated with other variables. They demon-
strate that these models are appropriate for accommodating interactions and are more
useful than models in which the parameter variation is purely stochastic or depends
only on time. Consider the following structure for a marketing interaction model:
26. See Cleveland ( 1979), Mahajan, Bretschneider, Bradford ( 1980), Leeftang, Plat ( 1984), Eubank ( 1988) and
Hastie, Tibshirani ( 1990).
27. See Parsons (1975).
28. See Wildt, Winer (1983), Blattberg, Kim, Ye (1994), Papalia, Krishnamurthi (1996).
SPECIAL TOPICS 475
Although the demand function (17.60) is usually estimated with time-series data,
the parameter function ( 17.61) has been estimated both over time (e.g. Parsons and
Vanden Abeele, 1981, Mela, Gupta, Lehmann, 1997) and over cross sections (e.g.
Wittink, 1977).
29. See for a discussion Blattberg, Neslin (1990), Neslin, Schneider Stone (1996), Ailawadi, Neslin, (1998},
van Heerde, Leeflang, Wittink (1999c).
476 CHAPTER 17
The arguments in favor of dynamics are as follows. Promotion frequency and depth
of price discounts may influence consumers' price expectations. In turn, expectations
of price and non-price promotional activities explain some consumer brand choice
behavior (Kalwani, Yim, 1992). Thus consumers' stockpiling behavior of a brand (or
product category) may also depend on the size and recency of temporary price dis-
counts. If there exists substantial variation in these promotional activity components
over time, we should find dynamic effects in models of promotion effects.
The f3r i are the price elasticities: the own-brand price elasticity if r = j or a cross-
brand elasticity if r "I= j. The Yirj are the multipliers for promotion .e: own effects if
r = j or a cross-effect if r "I= j. The Akj 's are the store intercepts for brand j in
store k, and 8Jt are the seasonal multipliers for week t for brand j. In (17.62), the
own-brand price parameter f3r i, r = j reflects the increase in brand j 's sales in the
presence of a temporary price cut for the brand. Reasons for this sales increase may
include brand switching, consumption increase, stockpiling and purchase accelera-
tion. With larger past discounts and with more recent price discounts, for both own-
and other brands, this price parameter should move toward zero. Own-brand, feature-
and display parameters may similarly decline with more recent promotions.
To create a dynamic version of (17.62), equations were specified for the price
parameters, the promotion multipliers and the store intercept ().kj). To illustrate, we
show the price parameter process jUnction ( 17 .63) for the own-brand price elasticity.
In (17.63), the own-price elasticity depends on the magnitude and time since the
previous price discount for own- and other brands. The expected signs are shown
in parentheses below the equations.
(+) (+)
where
w
Dsumkjt L 17"'- discounfkj.t-s for brand j in store k,
1
s=l
11 = decline rate (0 < 11 :S 1),
w number of weeks of history used for the explanatory
variables in (17.63),
SPECIAL TOPICS 477
parameters to vary across stores and over time as a function of brand- and store-
specific promotional activities.
Foekens et al. ( 1999) obtained various dynamic effects. In particular, baseline
sales was found to be dynamic, consistent with expectations: an increase in sales due
to a temporary price cut tends to reduce baseline sales (sales under nonpromoted con-
ditions) in periods following the promotion. By allowing the store intercepts to vary
based on the magnitude of a temporary price cut and on the time since a promotion,
Foekens et al. (1999) also accommodated the illusive post-promotion dip. 30 They also
obtained dynamic effects in the own-discount elasticity, specifically showing that the
magnitude and time since a previous discount change this elasticity. 31
30. This can also be accomplished by a model with lead and lagged effects. See van Heerde, Leeflang, Wittink
(1999c).
31. This topic is also discussed by, for example, Krishna ( 1991) and Raju ( 1992).
CHAPTER 18
Validation
Two critical elements of model building are model specification and parameterization.
Model specification depends on intended use, but should also be based on theoretical
and experiential knowledge. Parameterization includes data collection, model esti-
mation and model testing. In this chapter we consider an additional part of model
building: validation 1 (also verification or evaluation).
In its broadest sense validation is an assessment of the quality of the model results.
Validation criteria for model building can relate to:
We briefly review the different criteria in Section 18.1 and the relevance of statistical
tests in Section 18.2. In Section 18.3 we discuss face validity criteria which are used to
determine whether model results are in accordance with theoretical and/or common-
sense expectations. (Literally, face validity refers to the extent to which one's face
becomes red if one is questioned about the logic of the empirical results.) We intro-
duce criteria for model selection in Section 18.4. The idea of model selection is that
we often have alternative model specifications, and we use data to distinguish between
the alternatives. The superiority of one model over another may depend on the product
category and on competitive conditions but also on the quality of data. Even though
theoretical arguments should inform the model specification, in marketing we want
the empirical results to be not only consistent with what sound arguments dictate but
also with how the marketplace behaves subsequent to model testing. With new data,
the question is whether extant models apply, and with new models the question is
whether the proposed specification outperforms prevailing benchmarks.
I. The meaning of validation varies across the sciences as well as within a discipline. For an in-depth discussion
of this concept in a marketing context, see for example, Heeler, Ray ( 1972), Zaltman, Pinson, Angelmar ( 1973)
and the special issue of the Journal of Marketing Research, vol. 16, February 1979.
479
480 CHAPTER 18
We have argued that the success of a model depends on the likelihood of model
acceptance. The aspects that contribute to the likelihood a model is implemented are:
• model-related dimensions;
• organization-related dimensions;
• implementation strategy dimensions.
This portion of the validation exercise is called specification error analysis or "di-
agnostics".2 If there is no evidence that the assumptions about the error term are
violated, statistical tests can be applied to the model results. For example, we can test
whether the model as a whole is statistically significant (with regard to its explanatory
power), and whether individual parameter estimates are statistically significant.
Econometric analysis consists essentially of trying to explain variation in the
criterion variable by fluctuations in the predictor variables. If a predictor does not
vary in the sample data, it cannot explain variation in the criterion variable. Indeed,
the amount of variation in, and covariation between, predictor variables influences
the nature of statistical analysis. If the data are not amenable, other methods such
as experimentation or subjective estimation should be considered. Thus the model
builder needs to know what the data allow for.
The model-building process should end with "an acceptable, final model" as we dis-
cuss in Section 5 .1. However, to arrive at a "final model" several models may have to
be calibrated and tested. In marketing, the choice between alternative model specifi-
cations is often not straightforward, in part because relative performance depends on
the measure(s). Over the years various criteria, search processes, empirical rules, and
testing mechanisms have been proposed to guide model selection. With regard to sta-
tistical tests based on model fit, one approach is to specify a supermodel which "nests"
all alternative models. Such a supermodel may not be of interest in itself but it can be
useful to determine the adequacy of each alternative model. We discuss comparisons
of nested models in Section 18.4. We list information criteria which can be used
in model selection, and we mention how a choice can be made between competing
nonnested models. We also introduce so-called causality tests which are used to dis-
We have argued, in Chapter 16, that the error-term assumptions in econometrics are
needed for statistical inference. Of particular interest is the assumption that E(u;) =
0 or E(u; I X;)= 0 for all i, i = 1, ... ,I. If the predictor variables are "fixed",
this first assumption about the error term is required for a claim that the parameter
estimates are unbiased. To the extent possible, the data should be used to test this
assumption. But a model builder who is interested in getting a model implemented
may not have an incentive to complete an exhaustive search of all ways in which this
most critical assumption can be falsified. For example, if we have exhausted the data
for the measurement of relevant predictor variables, but nevertheless use statistical
tests for omitted variables by adding polynomial expressions of the included predictor
variables, we risk having to reject the model altogether as being invalid. Of course,
in the long run, an incomplete model will be rejected. However, the model may have
many attractive characteristics, and the model builder may want to believe that a long
time will pass before it is found to be deficient.
3. Conditional forecasting is the tenn used to describe situations in which the model is used to obtain forecasts
conditional upon certain decisions, such as a specific marketing mix (see Section 8.2). If the empirical results
lack face validity, it is unlikely that one can obtain acceptable conditional forecast accuracy.
VALIDATION 483
This dilemma suggests that if the model builder's rewards depend on model im-
plementation, there is an implicit incentive to make only a partial attempt to falsify
empirical results. The criteria for implementation by the model user should then in-
clude convincing results from the latest tests with the highest power to reject incorrect
models. 4
Statistical tests for the error term assumptions also have other limitations. Con-
=
sider the third assumption, viz. Cov(u;, u i) 0, i :j: j. For time-series data, econo-
metric textbooks offer the following discussion when this assumption is violated (for
convenience, we use a linear model with one predictor variable):
Yt = a + f3xt + Bt, fort = 1, ... , T (18.1)
where
Bt = PBt-1 + Ut and E(ut) = 0. (18.2)
With these qualifications, it is possible to show that the OLS estimator is unbiased.
Thus, the existence of a first-order autocorrelation in the error term does not have
to detract from the validity of model specification ( 18.1 ). Unfortunately, the text-
books rarely discuss the source of a (first-order) autocorrelation, and the reader is
not encouraged to be critical of the joint plausibility of (18.1) and (18.2). Instead,
the common recommendation is to estimate the first-order autocorrelation, and to use
Generalized Least Squares (GLS). With an estimated value for pin (18.2), the claim
is made by many, including ourselves, that GLS provides asymptotically unbiased
parameter estimates that are (asymptotically) more efficient than OLS.
The fact that it is possible to show analytically that autocorrelation does not have
to lead to biases in the OLS estimates does, however, not provide support for the
applicability of this particular perspective in practice. Since empirical model building
is subject to severe limitations (e.g, incomplete model structure, missing variables),
we expect that the existence of autocorrelated disturbances is virtually always due
to misspecification. Therefore, we suggest the same remedy that applies in case of a
violation of the first assumption (E(u;) = 0) be applied for violations of the third as-
sumption (Cov(u;, u j) = 0). The test for the presence of autocorrelated disturbances
in time-series data is then merely a convenient way to explore the plausibility of the
model. However, this recommendation is not much appreciated in empirical research.
It is, after all, quite straightforward to use GLS with an estimated autocorrelation
coefficient (this is the indicated solution if something equivalent to (18.1) and (18.2)
is assumed). On the other hand, the search for a missing predictor variable can be
interminable.
We believe that, in general, the proposed model is invalid if either one of these
assumptions fails. As we mention in Chapter 16, in that case statistical inference
cannot proceed, and substantive use of the results is not recommended. 5 Assumption
4. By the same token a researcher who wants a paper published in an academic journal may also lack the
incentive to falsify a proposed model. Thus, reviewers of papers have to insist on the author's use of tests with
the highest power to reject incorrect models.
5. Although we do not have specifics on the following argument, we can imagine having a model that tails a
critical assumption but still provides guidance to managers, is better than the alternative of not having a model.
484 CHAPTER 18
2, (Var(u;) = a 2), however, can be violated when the model specification is correct.
Especially with cross-sectional data it is common for the values of the criterion vari-
able to vary in size, and for the error to be proportional to size. Thus, it can be quite
logical to specify:
where
Now one can use Weighted Least Squares (WLS), which is a specific form of GLS
(see Chapter 16), dividing both sides of (18.3) by x;. This will generate unbiased and
efficient parameter estimates. In this sense, a finding that the second assumption is
violated, does not invalidate the model specification.
The fourth assumption, normally distributed errors, we believe is also likely to be
violated if the model specification is incorrect. If the error term captures the effects
of a large number of omitted variables, each of which by itself has only a minor
influence on the criterion variable, then it can be argued that this error term should be
approximately normally distributed. A violation of the normality assumption could
then indicate that there is an omitted variable with more than a minor impact. It is
in this sense that the tests of three assumptions about the error term can invalidate
a model specification. And, if there is no evidence that any of the assumptions is
violated, based on rigorous testing, we can move on to other types of validation.
Face validity relates to the believability of a model's structure and its output, or the
validity at face value. Face validity is based on theoretical and common-sense expec-
tations, and on broadly accepted previous empirical results. This prior knowledge can
be put to work in various ways as indicated by Massy (1971, p. 6): in structuring the
model, in selecting appropriate estimation methods and in benchmarking the results
from new data.
The use of face validity as a criterion depends, of course, on the model builder's
prior knowledge with regard to the phenomena under study. This knowledge should
guide model specification (e.g. relevant variables, operationalization of measures,
functional forms). The model structure that is ultimately subjected to estimation can
therefore also be evaluated in terms of face validity. The bases for knowledge about
marketing phenomena include theories developed in relevant disciplines, such as
economics and psychology. Additional theoretical knowledge is generated within the
field of marketing. And a large body of empirical results, much of it generated within
the marketing field, can be used to determine the face validity of new empirical re-
sults. Of course, prior knowledge should not stifle the acceptance of surprising results.
Current thinking about (marketing) problems is incomplete and sometimes incorrect.
VALIDATION 485
One purpose of model building is to test the prevailing theories. Face validity, how-
ever, has a place when we believe our understanding of the nature of relationships is
quite good.
Consider a model of (unit) sales for a brand, as a function of predictor variables
such as own-brand price and own-brand advertising. Except for unusual cases, for
example if price is interpreted by consumers as an indicator of quality, we expect
price to have a negative effect on sales. Advertising, on the other hand is expected to
have a positive effect on sales. Thus, ordinarily we are suspicious of empirical results
that show ''wrong" signs for the (partial) slope coefficients. 6
Given a relatively large number of empirical studies in which price and adverti-
sing effects have been reported, and the meta-analyses completed on estimated effects
(see Section 3.3), the argument has been made that today newly developed results
should be tested against the average effects computed from published studies. The
focus then shifts to a statistical test of the difference from the prevailing average
effect, and the face validity of a significant difference in one direction. In this sense,
the basis for face validity considerations about advertising effects can be the average
advertising elasticity of 0.10 reported by Lambin, based on 38 cases (1976, p. 93)
and also 0.10 based on 40 estimates analyzed by Leone and Schultz (1980). How-
ever, Assmus, Farley and Lehmann (1984) report an average short-term advertising
elasticity estimate of0.22, from 128 econometric model results. Assmus et al. (1984)
also report an average carryover coefficient of0.47, which implies a long-term sales
impact of 1/(l - 0.47) = 1.88 times the short-term impact. Separately, Lodish et
al. (1995b) report average long-term television advertising effects from 55 tests for a
cross section of established brands of consumer packaged goods. One of their findings
is that the first-year sales impact of (successful) advertising campaigns is approxi-
mately doubled when the lagged sales impact over the next two years is added. This
long-term impact of about twice the short-term impact is more or less in line with the
Assmus et al. result.
For price, Tellis (1988b) reports an average elasticity, based on 367 econometric
studies, of -1.7 6. More recent estimates of price elasticity can be compared against
this benchmark, and reasons for significant deviations in either direction should be
explored. In this manner, the meta-analysis results can enrich the process of "face
validation".
To illustrate the importance of allowing for surprises, we briefly review a study with
strong interaction effects between advertising and price. Eskin (1975) reports on a
field experiment for a new, inexpensive food snack, in which price and advertising
are manipulated. If the experiment had not been conducted, management would have
chosen either a combination of low price and low advertising or high price and high
advertising. This is based on the idea that a low price generates a low margin which
would not provide enough money for a large advertising campaign. Thus, a per-unit
6. The criticism will be muted if a coefficient with a sign opposite to expectations is not statistically signif-
icant. If the sign is "wrong" and the coefficient is significantly different from zero, we would suspect model
misspecification.
486 CHAPTER 18
Unit
sales
(relative to
store size) •
High advertising
Low advertising
Price
Figure 18.1 Relation between sales and price, separately for high- and for low advertising
support.
cost argument would have been the basis for the joint advertising-price decisions. An
alternative demand-based argument is that a higher amount of advertising enhances
the strength of preference for the advertised product. Stronger preferences reduce con-
sumers' price sensitivities, and this would also favor low-price with low-advertising
and high-price with high-advertising combinations.
One advantage of experimental studies is that, with appropriate controls and ran-
domization, there is less worry about omitted variables and other misspecification
issues. In this case the model included certain control variables to minimize the
confounds that occur in field experiments. Another advantage of experimentation is
that the use of all possible combinations of alternative price levels and advertising
amounts facilitates the estimation of interaction effects. In Eskin's model, the inter-
actions were counter to expectations. At high levels of advertising, the aggregate price
sensitivity was much higher than at low levels of advertising (see Figure 18.1 ). This
suggests (in practice, one would determine the profit-maximizing combination) that
the optimal mix would be either high advertising and low price, or low advertising
and high price. Either of these demand-based combinations was inconsistent both
with the supply-side argument and with the theory about how advertising for a brand
affects an individual consumer's price sensitivity. Since an experimental-study result
has more credibility than one based on non-experimental data, Eskin's explanation
for the surprising result was easily accepted. The explanation was that higher amounts
of advertising for a new brand attracts consumers who would ignore the product at
lower advertising levels. These additional consumers have lower marginal utility for
the product, and purchase it only if the price is low.
VALIDATION 487
18.4.1 INTRODUCTION
The face validation criteria discussed in the previous section may allow a model
builder to reject some models. However, it is common for more than one model
specification to produce plausible results. Thus, face validity considerations may not
suffice when the desired result is to identify one "best" model. If neither theoretical
arguments pertaining to model structure, nor the violation of error-term assumptions,
nor face validity considerations allow the model builder to reject all but one of the
alternative models, we use explicit model comparisons.
There are at least two ways in which explicit comparisons can be made between
alternative models. In this section we discuss statistical methods that allow the model
builder to test hypotheses regarding relative model performance in the estimation
sample. We introduce nested model comparisons and applicable statistical tests in
Section 18.4.2. In Section 18.4.3 we discuss nonnested model comparisons, which
are especially relevant for tests of alternative functional forms.
Introducing the backward shift generator L, where L(m 1 ) = mt-1 and substituting
these generators in (18.5) we get:
St
mt- A.L(mt) = fhdst +{hast+---
1- pL
(18.6)
or
fhdsr +{hast St
me =
(1 - A.L)
+ -:-:----:-:---:--C"C"""
(1 - A.L)(l - pL)
(18.7)
Equation (18.5)
P AA = partial adjustment
with autocorrelation
P A = partial adjustment
C E A = current effects with
autocorrelation
C E = current effects
We show the relations between the various models (CE, CEA, PA, and PAA) in a
nesting scheme, in Figure 18.2.
We show in Table 18.1 the regression coefficients, t-statistics (in parentheses),
the coefficients of determination (R 2 ), the Durbin-Watson statistics and Durbin's h-
statistics. The coefficients in all four models are statistically significantly different
from zero. The much stronger explanatory power (R 2 value) and the relevance of the
lagged endogenous variable suggest that the partial adjustment (PA and PAA) models
are better than the current effect models (CE and CEA). The estimated adjustment
rates 5., in the partial adjustment models, suggest the presence of delayed response
effects.
It is interesting that between the two current effects models, the specification with
autocorrelated errors (CEA) shows a substantial degree of first-order autocorrelation.
Both models also have significant D.W. statistics. On the other hand, when the lagged
endogenous variable is incorporated, the autocorrelation coefficient is very close to
zero. Interestingly, the estimated current effects of both detailing and advertising are
much smaller in the PA and PAA models, and the reduction in the detailing effect is
especially large (both in an absolute and in a relative sense). However, the substantive
conclusions differ greatly between, for example, the CEA and PA models.
We also note several other interesting aspects in Table 18.1. We have argued that
the presence of first-order autocorrelation in the disturbance should rarely be treated
through the addition of an autocorrelation parameter. If this were an appropriate
remedy, then the Durbin-Watson (D.W.) statistic should be close to two in the CEA
model. Since the D.W. statistic is not much different from its value in the CE model,
it appears that the systematic pattern in the disturbances is not well captured by a
first-order autocorrelated error structure.
The other aspect worth mentioning is that the R 2 value for CEA is lower than the
490 CHAPTER 18
value forCE. This may seem surprising given that the CEA model includes an addi-
tional parameter (for the autocorrelation). The reason for this "anomaly" is that OLS
maximizes R 2 . Thus, even if there are statistical arguments that favor the use of GLS
over OLS, the fit of the model, expressed in terms of the original criterion variable,
cannot improve.
Although the results in Table 18.1 clearly favor the PA model (the autocorrelation
coefficient is irrelevant) we now formalize the nested model comparisons. Each pair
of nested models can be tested with a likelihood ratio test (statistic): 7
11 = [Eo/Ed-T/2 (18.8)
where
Eo = the residual sum of squares of a restricted model
associated with the null hypothesis (Ho),
E1 = the residual sum of squares of a less restricted model
associated with the alternative hypothesis (HJ),
T = the number of observations.
The test statistic can be written as:
-2lnry = TlnEo- TlnE1 (18.9)
which is asymptotically chi-squared distributed with (PI - Po) degrees of freedom
(Greene, 1997, p. 161) and where:
po = number of parameters in the restricted model, and
p1 number of parameters in the "unrestricted" model.
7. This test was also used by Weiss and Windal ( 1980). See also Section 16.6.
VALIDATION 491
Under classical hypothesis testing a more complex (less restricted) model is not
chosen unless there is statistically significant evidence in its favor. 8 The first pair of
models in Table 18.2 has the current-effects (CE) model as the restricted model (Ho)
and the current-effects autocorrelation model (CEA) as the less restricted alternative
(H1). The associated chi-squared statistic is significant at the 1%-level, implying
that the autoregressive current effects model is superior to the current effects model.
By the same logic the second test result suggests that the autoregressive partial ad-
justment model (PAA) is superior to the autoregressive current-effect (CEA) model.
However, the null hypothesis cannot be rejected based on the test of the partial adjust-
ment model (PA) against its autoregressive counterpart (PAA). The partial adjustment
model therefore emerges as the favored specification.
(for example, if two specifications have the same predictor variables but differ in
functional form), we discuss these criteria next.
Non-nested models may include the same predictor variables, or they may involve
some variables that are unique to each model. Suppose we consider the following
multiplicative model:
Models (18.5) and (18.10) use the same variables 11 but have different functional
forms. It is impossible to express model (18.10) as a constrained version of model
(18.5), or vice versa, which renders the nested model tests inapplicable. For non-
nested model comparisons and for models estimated with the method of maximum
likelihood (Section 16.6), we can use information criteria such as the Akaike Informa-
tion Criterion (AIC), the Schwarz criterion, Consistent AIC (CAlC), and a criterion
developed by Allenby (1990). These criteria are full sample criteria (implying that
there is no data splitting required as is often done for predictive validation as we dis-
cuss in Section 18.5). The information criteria seek to incorporate, in model selection,
the divergent considerations of accuracy of estimation and the "best" approximation
to reality. The statistics incorporate a measure of the precision of the estimate and a
measure of parsimony in the parameterization of a model.
Akaike ( 1974) proposed a simple model comparison criterion, based on an infor-
mation theoretic rationale. The precision of the estimated model is represented by the
(natural) logarithm of the maximum likelihood (lnL). A term, to penalize the lnL
term for lack of parsimony, is subtracted from In L:
AIC=-2A. (18.12)
where
SC = Schwarz Criterion,
T = the number of observations.
Allenby ( 1990) argued that both AIC and SC overemphasize parsimony, leading him
to propose:
where
ALC = Allenby's criterion.
Bozdogan (1987) proposed a criterion which penalizes overparametarization more
strongly then AIC. This Consistent AIC (CAlC) is computed as:
CAlC = -2lnL + (numberofparameters) · (lnT + 1) (18.15)
where
T = number of observations.
Rust, Simester, Brodie and Nilikant ( 1995) examined which of these and other model
selection criteria perform best in selecting the current model based on simulated data.
In their study the Schwarz criterion was the single best selection criterion. Due to
the high correlation between the results for alternative model selection criteria they
also suggest that the use of multiple model selection criteria may be unwarranted.
However, as with all simulation studies, the generalizability of these results remains
to be determined.
Another option for the selection of a functional form is the Box and Cox ( 1964)
transformation, which includes linearity as a special case. The Box-Cox functional
form can be written as: 13
(18.16)
l
where
= yf - 1 for A. ¥= 0
In:ff; forA.= 0,
(A) = predictor variables defined similarly,
(A)
xlt ' ... 'xu
e1 = a disturbance term.
13. See also Section 16.1.4.
494 CHAPTER 18
and
lny1 = f3o + f31Inxlt + ... + f3L lnxLt + e1 , for A= 0. (18.18)
Although a family of functions is defined by ( 18.16), a conventional likelihood ratio
hypothesis test oflinearity involves the null hypothesis Ho: A = I versus the alterna-
tive H1: A =/= 1.
In cases where the models are non-nested and differ in the set of variables, other tests
are required. Examples are the J-test, the P-test, the P-E-test and a likelihood-ratio-
based test developed by Vuong (1989). 14 The J, P and P-E-tests include the estimation
of artificial compound models, 15 which we briefly describe here.
Consider the following non-nested regression models that also differ in the pre-
dictor variables:
We now introduce five bivariate causality tests, 19 including two Granger tests and
two Sims tests which are regression methods. The Granger tests are based on the
following model:
K K
Yt = L1r11,iYt-i + L1ri2,iXt-i + Ut, t= K + 1, ... ' T (18.23)
i=l i=l
17. This section is based on Bult et a!. ( 1997) . For an extensive discussion of causality tests see the special
issues of the Journal of Econometrics, vol 39, ( 1·2). For overviews of causality tests applied in marketing see
Hanssens et al. ( 1990, p. 168) and Bult et al. ( 1997, p. 452). We note that we use the concept of "causality"
that can be tested with statistical metbods. This concept is not based on cause and effect in a s1rict philosophical
sense (see Judge, Griffiths, Hill, Liitkepohl and Lee, 1985, p. 667).
18. Other definitions of causality can be found in Zellner ( 1979) and in Hashemzadeh and Taylor ( 1988).
Granger causality emphasizes the predictive properties of a model. Tests described here can be used for causal
ordering with time-series data. The "causality" tests used on cross-sectional data are tests of associations.
19. Analytical arguments for these tests can be found in for example Chow (1983).
496 CHAPTER 18
where
7rll,i. 7r!2,i parameters,
Ur a serially independent random disturbance term
from a distribution with mean zero and covariance
matrix :E.
It is assumed that all polynomials have the same order K. If x1 does not cause y1 ,
7r!2,i = 0 for i = 1, ... , K, or
K
Yt = L rr11,i Yr-i + u; (18.24)
i=l
where
u; = a random disturbance term.
In both Granger tests (see below) the null hypothesis is that 7ri2,i = 0, i = 1, ... , K
in (18.23). Under the null hypothesis and the assumption that the disturbances are
normally distributed with mean zero and variance a 2 , the statistic GS is distributed
as an F -random variable with K and (T - 2K) degrees of freedom. The Granger-
Sargent test has the following form:
(RRSS-URSS) I K
GS (18.25)
URSS/(T- 2K)
where
RRSS the residual sum of squares of the restricted relation
(18.24),
URSS the residual sum of squares of the unrestricted relation
(18.23).
(18.26)
where
a:; = the estimate of Var(u7) in (18.24), and
aJ, the estimate of Var(u 1)in (18.23).
The Sims Methods regress x 1 on past, current and future y1 's. Sims ( 1972) showed
that under the hypothesis of no causality from x to y, the regression parameters
VALIDATION 497
corresponding to future y1 's are equal to zero. The significance of the coefficients
is tested using:
N
Xt = L Vj Yt-i + Bt. t= N + 1, ... , T - M (18.27)
i=-M
where
e1 = a random disturbance term,
M, N = the maximum number of "future" and "past" y 1 's.
Sf= (RRSS-URSS) I M
(18.28)
URSSI(T- M- N -1)
One difficulty with the S/test is that the disturbance term e1 in (18.27) is in general
serially correlated, and consequently (18.28) does not have the claimed distribution
if the null hypothesis is true (e.g. Geweke, Meese, Dent, 1983). To circumvent this
problem a Modified Sims (MS) test was developed. In MS lagged values of x 1 and y1
are included in the equations:
P N+P
Xt = LYiXt-i + L ViYt-i + w,, (18.29)
i=l i=-M
t = N + P + 1, ... , T- M
where
w1 = a disturbance term.
Relation ( 18.29) is also estimated in constrained (v; 0, i = =-
M, ... , -1) and
unconstrained form. The statistic MS is distributed under the null hypothesis as an
F random variable with M and (T - 2P - M- N - 1) degrees of freedom:
MS = (RRSS- URSS) I M
(18.30)
URSSI(T - 2P - M- N- 1)
A fifth test is the double-prewhitening method or the Haugh-Pierce test (Haugh,
1976, Pierce, 1977, Pierce and Haugh, 1977). This test has been used in a market-
ing context by, for example, Hanssens (1980a, 1980b) and Leeflang, Wittink (1992).
The direction of causality between y1 and x 1 is established by cross correlating the
u v
residuals of the univariate models fitted to each. These residuals, say 1 and 1 , are
causally related in the same way as y and x. Therefore, causality can be detected by
estimating the parameters in the regression of u1 on past, current and future v1 's in
the same manner as would be done by regressing y1 on past, current and future x 1 's.
498 CHAPTER 18
u v
The difference is that the residuals 1 and 1 are estimated by applying Box-Jenkins
techniques to y1 and x1 • Under the null hypothesis that y and x are not causally related,
ruv(k) = v
Corr(Ut-k. 1 ) is asymptotically independently and normally distributed
with mean zero and standard deviation T -I / 2, where Corr is the correlation between
v
the residuals Ut-k and 1• Using m correlations ruv(k), (k = 1, ... , m), under the
null hypothesis the test statistic HP is asymptotically chi-square distributed with m
degrees of freedom:
m
HP = T L r;v(k). (18.31)
k=I
There may be little consistency between the outcomes of the different causality tests,
as is illustrated in the application that follows. Tests which are only asymptotically
equivalent can give very different results in small samples. For one thing, the tests dif-
fer in the effective number of data points available for analyses. These consequences
increase in severity the smaller the sample size.
Bult et al. ( 1997) applied these tests to both actual and simulated data for a
study of the tests' power and alpha-values. In general, the best test is one with the
highest power and the lowest alpha. Power is the probability of rejecting a false
null hypothesis, and alpha is the probability of rejecting a true null hypothesis (for
these two alternative states of nature, power and alpha do not have to be related).
However, if bivariate causality tests are used to identify a multivariate model, one
may be tolerant of a modest alpha value. Since a bivariate test may fail to uncover a
causal relation that emerges only in the proper multivariate model, it may be prudent
to simply choose the test with the highest power. This is consistent with the idea
that excluding (potentially) relevant predictors is a more serious error than including
irrelevant predictors in a multivariate model. For pure bivariate causality testing, Bult
et al. ( 1997) recommend the Granger-Sargent test which is a relatively simple test,
and has a reasonable amount of power and little bias in alpha. If the bivariate test is
used only for identification purposes, the combination of high power and high alpha
associated with the Granger-Wald test may be attractive, given the asymmetry in the
cost of the two possible errors.
Section ll.l includes a model with competitive reaction functions: see the equa-
tions (11.15). In these relations the potential number of predictor variables is so large
that it exceeds the number of observations. Bivariate Haugh-Pierce causality test
statistics were used to identify a subset of the potential predictors for multivariate
modeling. We use a simplified version of model ( 11.15) to show how causality tests
can be used to identify a (competitive reaction) function. We also show how much the
conclusions about causal relations for a given data set may depend on the test statistic
used.
Consider the following model in which the change in price of brand j in period t
is a function of the changes in prices of competing brands r, for j = l, ... , n:
VALIDATION 499
n T*+I
ln(pjr/Pj,t-1) = fXj + L L /3jrt•ln(pr,t-t*+IIPr,t-t•) + Vjt• (18.32)
r=l t*=l
ToFj
t = T* +2, ... , T
where
Pit = the price of brand j in period t,
Vjt = a disturbance term,
T* the maximum number of time lags, T* = 8, 20
T = the number of observations available, T = 76,
n = the number ofbrands, n = 3.
Relation (18.32) is a competitive reaction function with only "simple" competitive
reactions: the price change for brand j is determined only by the price changes for
competitive brand r, r =f. j, not by changes in other marketing variables. The changes
are expressed as the logarithm of the ratio of prices in two successive periods. This
reflects the idea that price changes for brands with different average price levels are
more comparable on a percentage than on an absolute basis.
Aggregated weekly scanner data for a frequently purchased nondurable, nonfood,
consumer product sold in The Netherlands covering a period of 7 6 weeks were used
to calibrate (18.32). In this application, the prices of three brands are considered. The
market research company ACNielsen (Nederland B.V.) collects and distributes these
data in the Netherlands based on a sample of about 150 stores. Price changes primarily
reflect temporary changes due to promotional activities. The number of parameters to
be estimated per equation is 1 + (n- l)(T* + 1). Substituting n = 3 and T* = 8, we
find that even this strongly "simplified" model requires 19 parameters while there are
T- T*- 1 = 67 observations available for the estimation of (18.32). It is easy to
see that in practice (with more brands, and more variables including lagged effects)
the number of parameters can quickly exceed the sample size. This motivates the use
of bivariate tests to select the potential predictors for the estimation of multivariate
competitive reaction functions.
The question we address is whether the choice of a causality test has an impact
on the composition of the set of predictor variables in the multivariate specification.
We show a summary of the statistical significance results in Table 18.3. There is
perfect agreement in test outcomes for the GW and HP tests. There is also perfect
agreement between the two Sims tests, Sf and MS, in the sense that both tests indicate
no evidence of causality within the set of three price variables. Of the four cases
of significance based on the GW test, two also show significance for GS. This is
consistent with the result reported earlier, that the Granger-Sargent test has less power
(and a smaller alpha) than the Granger-Wald test. Across all five tests however, the
20. In preliminary research we found that the maximum lag structure for the three brands is eight weeks. Thus
the order of the polynomials also equals eight.
500 CHAPTER 18
prJ GW,HP
pr2 GW,HP GS, GW,HP
pr3 GS,GW,HP
a The entries indicate that the null hypothesis of no causality for the pair
of variables is rejected, based on the identified test statistic, p < 0.05.
b prj= In(pjr/Pj,t-I).
results in Table 18.3 imply that conclusions about causality depend strongly on the
test used.
The outcomes in Table 18.3 indicate that changes in the present and/or past prices
of brand I may have causal effects on changes in the price of brand 2 ( GW, HP ),
but not on brand 3, etc. The tests also indicate (not shown) the most likely lags with
which causal effects occur.
One limitation of this study is that only bivariate causality tests were used and
compared for the identification of relevant predictors for the specification of a multi-
variate model. Multivariate causality tests, such as the VARMA-model (Judge et al.,
1985, p. 667), can be used to detect causality or to determine the direction of causality
in a fully specified model. For an application of the VARMA-model see Boudjellaba,
Dufour and Roy ( 1992). An alternative multivariate test was developed by Liitkepohl
(1989).
We have argued that if a model is built primarily for descriptive purposes, it should
have face validity. If its primary purpose is for predictions, one might argue that
face validity is not required. It is well known, for example, that models lacking de-
scriptive value can provide accurate forecasts. However, the marketing models that
are the topic in this book should be especially useful for conditional forecasting
purposes. Ideally, a model user specifies alternative marketing programs, and for each
possible program obtains an accurate, model-based forecast of consumers' purchase
behavior. This reflects the idea that a marketing manager's environment is to some
extent controllable. Marketing models should provide a basis for the manner in which
the manager can profitably shape that part of the marketplace in which his or her
brands operate. Thus, if the model builder wants to create a model that provides
information about the dependence of marketplace behavior on marketing activities, it
is critical that the predictive validation exercise is informative about the model's gen-
VALIDATION 501
eralizability. Our view is that empirical research can identify regularities in the data
but that observable systematic patterns in predicted values depend on marketplace
characteristics.
In the marketing literature, it is very common for model builders to use some
form of predictive validity as a statement of the model's usefulness. However, there
is no established standard for the conduct of predictive validation. In this section we
provide a perspective on this popular form of model validation.
If the marketplace, in which consumers react to the products and services offered
to them, were a stable environment, research on marketing could concentrate on
the generation and testing of laws that govern consumer purchase behavior. How-
ever, consumers are ''fickle", technological advances influence consumers' needs and
preferences, and firms make adaptations in their marketing programs. The question,
then, is whether a result generated from data in one product category, representing
one geographic area and one time period, generalizes to other conditions. Predictive
validation can be used to address this question indirectly.
Another reason for predictive validation has to do with the fact that model builders
use the sample data partly to guide model specification. If data collection and analysis
were very expensive, model builders would work much harder on model specification,
to generate one that is fully informed by the prevailing theoretical and empirical
knowledge. Instead, initial model specifications tend to be poor and ill-informed.
The more sample data are used to craft a model specification, the more the model
fit to the same data will overstate the model's applicability to other data. Practices
such as stepwise regression analysis reflect an inability or unwillingness by model
builders to prepare well-informed initial model specifications. As a result, much of the
effort that should be invested in model specification prior to data analysis is allocated
to tinkering with the model based on the data. An iterative process may reduce the
likelihood of biases in parameter estimates (e.g. biases due to model misspecification)
but it also destroys the statistical properties of the estimators. The iterative process is
often referred to as pretesting.
Suppose that we did complete the model specification stage, prior to data analysis,
through an elaborate process which allows us to be very confident that:
1. we have properly defined the criterion variable;
2. we have identified all relevant predictor variables, and have chosen the most
appropriate functional form for each predictor;
3. we have specified the relevant interaction effects to be accommodated;
4. we have reliable measures of all variables; etc.
Under these circumstances, there may be just one model specification to be sub-
jected to estimation and testing. If there is no evidence of violations of error term
assumptions, we use statistical tests to determine whether the model has (signifi-
cant) explanatory power, and whether individual effects differ (significantly) from
zero. These steps are especially relevant before we can claim descriptive validity
for the model. But we may not conclude that the model is "best", unless we can
502 CHAPTER 18
rule out alternative specifications. Even then, the question remains to what extent
the empirical results generalize to other product categories, to other regions, and to
other time periods. To examine this, one could repeat the data collection and analysis
and determine whether the proposed model "survives". One possible stringent test
consists of a comparison of parameter estimates based on a null hypothesis of equal
parameter values (across regions, over time, etc.).
But what if one wants to know the accuracy of forecasts? Econometricians have
derived the statistical properties of model-based forecasts, given error-term assump-
tions. For example, if the simple, linear model applies:
we can use the estimated parameters and a specific predictor variable value, say xo,
to obtain the (conditional) forecast:
(18.34)
If the error term assumptions hold, we can construct a confidence interval for the
unknown value yo, given x0 :21
Yo ± t Syo (18.35)
where
1 (xo- x)2
= s 1+-+ I ,
I (x; - x)2
Li=l
= the tabulated value of the t distribution, corresponding to
the desired degree of confidence and the model's degrees
of freedom, and
s = the estimated value of the standard deviation of y.
If the model is accurate, and generally applicable, then the confidence intervals so
constructed should contain the actual yo values a specified percentage of the time (the
percentage being equal to the degree of confidence).
In the absence of "pretesting" (the iterative process of model building) one can
compute expected predictive validity measures. The formulas developed for this pur-
pose22 assume that the characteristics of predictor variables remain the same. Never-
theless, it is useful to employ these formulas, if the estimators have the assumed
statistical properties, for at least two reasons. One has to do with managerial use. If
a model is used for decision-making purposes, it is helpful to have a quantification
of the expected error of model-based forecasts. The other is that the formulas can
provide insight into the difference in expected predictive validity between alterna-
tive model specifications. For example, even if we reject a simplified version of a
21. See Wittink (1988, p. 47).
22. See e.g. Hagerty and Srinivasan ( 1991 ).
VALIDATION 503
particular model (e.g. the preferred model includes an interaction term with a statis-
tically significant effect, so that a model without the interaction term is statistically
inferior), it is possible for the simplified version to outperform the preferred model
in the accuracy of predictions. Simplified versions provide superior predictions if
the disadvantage of a missing interaction term (which creates misspecification bi-
ases) is smaller than the advantage of estimating fewer parameters (which tends to
reduce the estimated standard errors). We note, however, that misspecification bi-
ases will become more influential in relative performance comparisons of alternative
model specifications if the characteristics of the predictor variables differ between
estimation- and validation samples.
We now consider the more realistic setting in which "pretesting" is rampant.
Strictly speaking, all statistical tests applied to the final model specification are in-
valid, but in both academic studies and in practical applications this problem is often
ignored. Partly because of the many iterations through which model building tends
to proceed, we insist on evidence of predictive validation. Here we have a potential
conflict of interest. The individual who builds the model also tends to favor its use.
Given that the model-building process is intensive and that the model builder has
developed a personal preference for it, there is no incentive for him/her to engage in
an exhaustive, potentially reductive, validation exercise. Since all models are incom-
plete by definition, any model subject to sufficient examination will eventually break
down. This conflict of interest is one possible reason why the practice of predictive
validation is poor, as we argue below.
If the model is estimated with cross-sectional data, it is common for model builders
to split the data randomly into two samples. The first (analysis or estimation) sample
is used to estimate parameters, to test the error term assumptions, etc. The second
(holdout or validation) sample is used to quantify the predictive validity (we discuss
predictive validity measures below). In this context, one advantage of using a valida-
tion sample is that in validation the model is penalized for each degree of freedom
used to estimate an additional parameter. And this penalty is greater in validation
measures than it is in measures that adjust for degrees of freedom in the estimation
sample, such as adjusted R2 and the standard deviation of residuals (see the formula
for expected predictive validity). Also, the greater the number of data-based iterations
the model builder goes through to arrive at the final model specification, the larger the
expected deterioration in predictive validity relative to the fit in the estimation sample.
The question we pose is whether randomly splitting cross-sectional data into es-
timation and validation samples provides a true opportunity for the model builder to
predictively validate the model. If we have built a model that is missing a critical
variable so that one (or more) parameter estimates is (are) biased, will the predictive
validity results be much poorer than the estimation results (e.g. will the predictive
validity results suggest that the estimated model is inadequate)? Unfortunately, this
is very unlikely in the case of random splitting. The reason is that estimation and
validation samples are expected to have the same data characteristics. The bias in the
effect of one or more included predictor variables will not reduce the model's predic-
504 CHAPTER IS
tive performance (relative to the model fit in estimation), if the correlation between
included and excluded variables is the same in the estimation and validation samples.
Thus, in case of randomly splitting the data there is no opportunity to invalidate the
estimated model.
If the model builder uses time-series data, the situation is somewhat improved. Specifi-
cally, it is rare for model builders to split time-series randomly. There are several
reasons for this. One is that the model may contain a lagged criterion variable in
which case it is important to maintain the time sequence in the data. Another is that
the time sequence in the data can be exploited in tests of autocorrelation in the error
term. In addition, it is useful to examine a model's predictive validity to future time
periods. Thus, if there are two years of, say, weekly data, the model builder may
use the first year for estimation and the second year for validation. Time-series data
can then provide an (implicit) opportunity to the model builder to check whether the
results apply to a new time period. However, if the two years are very similar in
data characteristics, for example if the market environment has not changed, then the
validation exercise resembles the random splitting of cross-sectional data procedure.
Thus, the larger the changes in the environment over time, the more powerful the
validation exercise. At the same time, the larger the changes, the more likely it is
that weaknesses in the model reduce the predictive validity. This suggests that model
builders should at least report how the validation sample characteristics differ from
the estimation sample.23
Time-series data also provide other useful options. For example, model users may
insist on evidence that a proposed model outperforms some benchmark. Brodie and
de Kluyver (1987), Alsem, Leeflang and Reuyl (1989) and Brodie, Danaher, Kumar
and Leeflang (2000) compared the performance of marketing-mix models to that of
a naive model, which predicts next period's value for the criterion variable to be
this period's actual value. One might argue that little faith should be placed in the
parameter estimates, if the proposed model does not outperform a (naive) model
that lacks structural characteristics consistent with marketing traditions. As shown by
Foekens et al. (1994), the relative performance also depends on the extent to which
the characteristics of the data change between estimation and validation samples.
Foekens et al., (1994) argue the following:
23. Foekens, Leeflang and Wittink (1994), for example, report the percent of stores involved in promotional
activities, separately for estimation and validation samples (Table 3, p. 254).
VALIDATION 505
the correlations remain the same. The greater the change in correlations the more
likely it is to obtain poor predictive accuracy for misspecified models.
• One can determine the extent to which the same model, applied separately to es-
timation and validation data, produces different parameter estimates. Substantial
differences may occur for several reasons. One is that the true parameters have
changed. Another is that the estimated parameters differ because the model is mis-
specified, and this misspecification differentially affects the parameter estimates
in the two samples.
• Additionally, the predictive validity will be affected by a difference in error vari-
ance between estimation and validation samples.
In Section 14.1 we present some of the findings of Christen, Gupta, Porter, Staelin
and Wittink (1997) on aggregation bias in parameter estimates of loglinear models
estimated with linearly aggregated data. Christen et al. show that this bias depends
especially on the proportion of stores engaging in a promotion (i.e. on heterogeneity
in promotional activities across the stores). Thus, at an aggregate level, such as the
chain- or market level, changes in the proportion of stores engaged in a promotion
may contribute to a deterioration in forecasting accuracy in validation relative to what
might be expected based on model fit in estimation. In this way, an understanding of
sources of biases in parameter estimates can provide direction to model builders for
the purpose of diagnostic predictive validation.
Foekens et al. (1994) examined how each of the aforementioned factors con-
tributed to the Mean Squared Error (MSE) in validation samples. They found that
parameter instability (between estimation and validation samples) and a measure
of the change in the correlations between the predictor variables have statistically
significant effects on change in accuracy.
We now consider how available accuracy measures can provide some diagnostic
value. The formula for the mean squared error in validation (see equation (18.43) be-
low) shows that model performance depends on bias and variance. A more complete
model should have a smaller bias but may have larger variance than a less complete
model. If we desire to build a model that has the highest possible descriptive value,
we should be especially interested in testing for lack ofbias in the predicted values.
To formalize, suppose we have T observations in total, and use the first T* obser-
vations for estimation, leaving (T- T*) for validation. Thus, the unknown parameters
fh, ... , fh in a multiple regression model are estimated using T* observations. Sub-
stituting the estimates P1 .... , PL and using the values of Xlr. x21, ••• , XLt, fort=
T* + 1, T* + 2, ... , T, the following predicted values of y1 are obtained:
(18.36)
Comparing the predicted values y1 with the actual values of Yr. t = T* + 1, ... , T,
the predictive validity of the relation can be determined. To test for a lack of bias we
506 CHAPTER18
RASPE=
'Li=T*+ I (y, - Yt )2 (18.39)
T- T*
The value for RASPE can be compared to the standard deviation of residuals in the
estimation sample. In general we expect the value of RASPE to be greater than the
standard deviation of the residuals, the actual difference being a function of the factors
we have mentioned earlier. RASPE is also known as the root mean squared error
(RMSE)(Section 17.1).
A predictive validity measure that is dimensionless, easy to relate to, and poten-
tially useful if one wants to make comparisons of forecast accuracy across different
settings, is the Mean Absolute Percentage Error (MAPE):
I I
1
MAPE= - -
T ~
Yt- YtL ·100. (18.40)
T- T* t=T*+l Yt
24. See also Wittink ( 1988, pp. 268-269).
VALIDATION 507
In this measure absolute, rather than squared, errors are computed, and each absolute
error is expressed relative to the actual value, for observation t in the validation sam-
ple. If one believes that the magnitude of an error should be considered relative to the
corresponding actual value, MAPE may be a suitable measure.
RAE =
L.i=T*+l I Yt- Yt I (18.41)
L.i=T*+l I Yt - Yt-J I
where
y1 (y1 ) = the actual (predicted) value in period t.
If RAE is less than one, the model outperforms the benchmark represented by a naive
model.
A measure that is conceptually similar to RAE, but uses squared prediction errors
instead of absolute ones, is Theil's U -statistic:
U= (18.42)
As with the RAE measure, if Theil's U -statistic is less than one, the model generating
y1 outperforms the naive model.
Theil (1965) shows how ASPE, the Average Squared Prediction Error in (18.38)
can be decomposed:
The first term in (18.43) captures the squared bias, while the second and third terms
together account for the prediction error due to unreliability (variance). For a model
that is linear in the original variables, both the first- and the second term are zero
in the estimation sample. The second term captures the difference in variability for
the predicted values (sy) and the variability for the actual values (sy) multiplied by
the correlation between actual and predicted values. The third term is the proportion
508 CHAPTER 18
of the variance in the criterion variable in validation that is not attributable to the
estimated relation.
The advantage of using such a decomposition of the prediction errors is that the
model builder can diagnose the source(s) of the errors. It is, for example, very useful
to distinguish between bias and variance. Consider a comparison of predictive validity
between two models, one being the "preferred" model, the other being a simplified
version. One would expect that the "preferred" model has less bias but potentially
more variance. The decomposition allows the model builder to separate a difference
in overall performance into differences due to bias and due to other components.
Importantly, the more the validation data characteristics differ from the estimation
data, the greater the expected contribution of the bias component to ASPE.
Once decompositions of prediction errors have been made, the natural question
becomes to what extent the (validation) data provided an opportunity for the "pre-
ferred" model to show better performance than the benchmark model. For example,
in a stable environment in which marketing activities show little variation, it may
be difficult to beat a benchmark model that predicts next period's value to be this
period's actual value. On the other hand, if there is a substantial amount of variation
in marketing activities in the validation sample, it should be possible to "beat" the
benchmark model. Another relevant aspect, as we mentioned earlier, is the extent to
which validation sample characteristics differ from the estimation sample. The greater
this difference, the stronger the opportunity to falsify a (wrong) model.
These considerations suggest that model builders should at least report central
tendency and dispersion measures, for both estimation and validation samples, as
practiced, for example, by Foekens et al. (1994). With access to this information a
user can make a judgment about two relevant aspects:
• the extent to which the validation sample allows for model falsification (perfor-
mance in validation- relative to estimation sample);
• the extent to which the validation sample allows the model to outperform a naive
model.
Van Heerde, Leeflang, Wittink (1997) have used such considerations in quantifying
what they call "diagnostic predictive validity". We discuss an application of this form
of predictive validation in Section 18.6.
18.6 Illustrations
Table 18.4 Theil's U-statistic for SCAN*PRO in year 1 (estimation) and year 2
(validation) in log sales index and sales index spaces.
sales index space includes an adjustment based on the error variance to reduce the
bias in predicted antilogarithmic values. We report the Theil's U -statistic results in
Table 18.4. The arithmetic average Theil's U -value for the log sales index variables is
0.580 in estimation and 0.658 in validation. Thus, on average the SCAN*PRO model
outperforms the benchmark incorporated in this statistic (18.42). In fact the results in
Table 18.4 show that the model outperforms the naive specification in each of the ten
metropolitan areas. This remains true when the logarithmic values are transformed to
the original sales index.
Instead of expressing a model's performance relative to a naive model, as in
Theil's U -statistic, it can be instructive to compute the absolute performance for
each model separately. We show in Table 18.5 the mean absolute prediction error
(in the sales index) for SCAN*PRO in the first column. This error varies from 23.52
in Jacksonville/Orlando to 72.91 in Seattle and is 37.12 on average. The SCAN*PRO
model outperforms the naive model (column 3) in all metropolitan areas, consistent
with the relative performance results in Table 18 .4. The naive model error varies from
28.39 in Jacksonville/Orlando to 128.67 in Seattle, with an average of 62.47. These
differences in absolute performance of the naive model suggest that the intensity of
promotional activities is low in Jacksonville/Orlando and high in Seattle.
The second column in Table 18.5 shows the performance of a model that contains
only weekly indicator variables. To the extent that the product is subject to seasonality
and/or is promoted in the same weeks in year 2 as in year 1, such an alternative
510 CHAPTER 18
Table 18.5 Mean absolute prediction erro!l for three models ofyear 2 in the sales index
space.
a The magnitude of the error can be compared against the reference sales index of I 00, [-3pt]
and actual values varying from 50 to 1.500.
naive model can perform well. Its actual performance is worse than the SCAN* PRO
model's but better than the naive model's, with the exception of Houston and Los
Angeles. In Houston the naive model outperforms the weekly indicators model, while
in Los Angeles SCAN*PRO loses to the "weekly indicators". This validation exercise
is also restrictive in the sense that we do not know how much of a difference exists in
the predictor variables between the two years of data.
In (18.44) all the estimated coefficients are statistically significant (p < 0.05). If all
price indices are equal to 1.0, the predicted value of D!ffequals 0.06 (the parametric
model having a larger prediction error). This is twice the average value of 0.03 in the
validation sample. Thus, the difference in performance is greater when there are no
discounts at all than on average across the actual price index values. By taking the
first derivative of (18.44) with respect to P h!r, and letting P hzr = P h3r = 1.0, we
512 CHAPTER 18
find that under these conditions the minimum value of Diff occurs at P h1 1 = 0. 78
(the second derivative is positive). For this scenario the predicted value of Diffequa1s
-0.03, meaning that at this discount for brand 1 the parametric model is predicted to
have a smaller prediction error. As long as the price index for brand 1 is either above
0.92 or below 0.65 (holding the other price indices at 1.0), the semiparametric model
will win in the validation sample, according to equation ( 18.44). Generally speaking,
if either of the other brands' price indices is below 1, the semiparametric model's
performance improves relative to the parametric one. Thus, whether the semipara-
metric model wins on average depends a great deal on the price index values in the
validation sample. The benefits of this diagnostic predictive validity exercise are that:
• equation (18.44) shows under what conditions one model tends to outperform
another, and
• the data characteristics should be taken into account when validation results are
reported.
Alsem, Leeflang and Reuyl ( 1989) calibrated market share models for nine brands
from three markets using bimonthly data. One of their models pertains to brand j in
the dried soup market in the Netherlands:
mj1 = e
cx1·
[ PJl· ] /3 1i
·
[ a·],1- 1 ]/3 i · d 13.
2 3j
· Ujt ( 18.45)
Pjl +per Gj,t-1 +ac,_, 11 '
t = 1, ... , T
where
mj1 market share ofbrand j in period t,
Pjt = price ofbrand j in t,
pel mean competitive price, weighted by market shares,
Gjl = advertising expenditures of brand j in t,
ac1 advertising expenditures of brand j 's competitors,
djt effective store distribution of brand j (measured as
a fraction of all stores where the product is available),
u jt = a disturbance term, and
T total number of observations of the estimation sample.
We show the estimated parameters of(18.45) and some statistical test results in Table
18.7. One advantage of model (18.45) is that it has a simple structure. However,
it is far from complete on important issues and it is not robust. For example, the
model omits variables that represent the effective store distribution of competitive
brands (dc1 ). Also, temporary price discounts, displays and featuring are excluded.
And while each brand has a (unique) number of varieties, the model is specified at
the brand level. One might argue in favor of a hierarchical model (see Section 14.5),
but the lack of available data makes its estimation unrealistic. On the other hand,
VALIDATION 513
Table 18. 7 Parameter estimates and statistical test results for (18.45).
a t-va1ues in parentheses.
Source: A1sem et al. ( 1989, p. 189).
the model accommodates some interaction between the marketing instruments. For
example, if d jr. the number of sales outlets carrying brand j at t is equal to zero, then
m jt = 0, no matter the brand's advertising share. However, the same implication for
advertising is less realistic, since zero advertising need not lead to zero market share.
The model was estimated on 17 bimonthly observations, covering a period of
nearly three years. It is quite conceivable that model parameters change or that the
model structure changes during such a long period. The number of observations how-
ever, is small. Six remaining bimonthly periods are used as a validation sample. An
increase in the periodicity of the data implies some market instability which enhances
the information content of the validation exercise.
As was observed earlier, the face validity of the model's structure is not high, for
example equation ( 18.45) does not satisfy range and sum constraints. However, all pa-
rameter estimates are statistically significant (p < 0.05) and have the expected sign.
Also, the magnitudes of the elasticities differ across the variables in a manner that
corresponds to what one would expect. The value of the Durbin-Watson statistic is in
the inconclusive region which indicates a possible concern about autocorrelation (and
raises questions about the validity of other statistical tests). As indicated in Section
16.1, there are several possible reasons for the error term to be autocorrelated, such
as omitted variables and structural changes. Other concerns include the definition of
competition which is assumed to consist of the three largest brands. This definition
was based on an earlier study of competitive reaction effects estimated from data
for all brands. 25 The model may also suffer from the assumption of a quasiduopoly
(brand j versus the other two brands treated jointly). Bultez and Naert (1973), for ex-
ample, found that treating each competitor uniquely in an attraction model improved
the outcome of the Durbin-Watson statistic.
Alsem et al. ( 1989) were especially interested in the use of predicted values for
competing brands' variables. A brand manager can contemplate alternative values for
own-brand marketing variables, and use the estimated parameters for those variables
to produce market share predictions. However, along with each possible set of own-
• competitive reaction functions 26 showing how pct and act-! in ( 18.45) are influ-
enced by Pit• ai,t-1 and dit;
• naive models relating pct and act-! to these variables' past values.
We show MAPE-values for these four sets of predicted shares relative to the actual
values in Table 18.8. Surprisingly, the results in Table 18.8 show that the use of actual
values for pct and aCt-! gives the highestMAPE, for equation (18.45). On the other
hand, the competitive reaction functions provide estimated values of pct and aCt-!
which generate the smallest MAPE-value. It is unusual to have superior market share
predictions from predicted competitors' activities than from actual values for those
activities. One possible explanation is that the predicted values dampen the variability
in the competitive variables in a manner that compensates for excessive variation in
other model aspects. Nevertheless, these results suggest that the estimated competitive
reaction functions provide an acceptable representation.
Many models with behavioral detail use input from managers. 27 We use the SPRINTER
new-product model, introduced in Section 10.2, to illustrate the idea of validating
input from managers.
We assume that interaction between management and the model-building team
has led to the structure shown in Figure 10.6. Validation starts with a critical evalu-
ation of this structure and of the measurement of model inputs. We noted in Section
10.2, for example, that knowledge oftotal market sales may lead to adjustments. The
distribution of frequency-of-purchase is based on survey data, and these data may
indicate that the distribution figures should be adjusted. Similar validation checks
should be made on individual inputs whenever possible. And, if one has reason to
believe that some parameters are over- or underestimated, along with prior knowledge
about the direction, then an adjustment is justified (and it should normally improve
model performance).
Once all individual inputs have been evaluated, we still do not know how well
the model actually describes the new-product purchase process. One possible test is
to compare model predictions with test market data. The tracking of past results can
lead to useful model updates.
In Section 10.2 we introduced the BASES model. If the supplier of BASES ser-
vices tracks subsequent introductions of new products, BASES clients can learn the
historical accuracy of predictions made (for products actually introduced). With each
subsequent application, given to a new-product introduction, the predictive validity
results can be updated. Note, however, that for BASES, ASSESSOR and other pre-
test-market models, the environment used for estimation is artificial (e.g. respondents
are aware of the new product and they have access to it). Thus, the supplier should
find out whether there are elements in the artificial environment that lead to biases in
predictions to the real world. And there is an opportunity to determine whether the
model performance depends on the product category, the type of consumers, etc.
One possible bias in the output of pretest market models derives from the judg-
ments managers provide. For example, in ASSESSOR applications, consumer re-
sponse is used to predict trial and repeat under the assumption that the product is
known by and available to all target market members. To obtain market share pre-
dictions, ASSESSOR uses managerial judgments about the new brand's expected or
planned consumer awareness and product distribution. The accuracy of the predic-
tions is then determined not only by the validity of the model and the validity of
consumer responses but also by the validity of the managerial judgments.
We show in Table 18.9 predicted and actual market shares for 24 new products
evaluated with the ASSESSOR methodology. In the first column we have the initial
predicted shares (i.e. based on managers' judgments about awareness and availabil-
ity). On average, the predicted value is 7.9 percent. The second column shows the
adjusted predicted shares. These are the predictions generated by ASSESSOR when
the actual awareness and distribution data are used (which are measured after the
new product is introduced). On average, the predicted adjusted value is 7.5 percent.
This difference of 0.4 percentage points indicates that managers tend to overstate
awareness and/or availability for a new product.
The third column shows the actual market shares achieved. Note that all of these
columns are silent about the specific amount of time after introduction. Presumably
each client has provided input and output values that represent an "equilibrium" or
long-run result under normal market conditions. We mention this to make it clear that
there is in fact uncertainty about the actual market share result.
The fourth and fifth columns in Table 18.9 show the prediction errors. On average,
the prediction error is 0.6 for the initial predicted values and 0.2 for the adjusted
predictions. Thus, the bias due to managers' overstatement of awareness/availability
is reduced considerably (the difference of0.4 percentage points mentioned above).
Finally, the average absolute difference between predicted and actual shares is also
VALIDATION 517
reduced from 1.5 (column four) to 0.6 (column five). 28 This suggests that the largest
part of the forecast error generated by the initial predictions is due to inaccuracy in
managers' judgments. If the bias persists, the judgments themselves can be adjusted
so that superior predictions are obtained prior to knowledge of the actual awareness
and availability. Note that, unlike Alsem et al., here the use of actual instead of pre-
dicted values, improves model performance.
In Chapter 12 we introduced discrete (brand) choice models. These models are spec-
ified at the individual consumer level. We discuss in Section 17.2 heterogeneity in
consumer response and how choice models can accommodate such heterogeneity.
The performance of a discrete choice model can be compared with that of a naive
model. One simple naive model 29 is to assume that each consumer chooses any
brand with probability 1In on every occasion. However, this benchmark is naive,
in that it assumes that all n brands have the same market share (1 In) and that all
consumers have all brands in their consideration sets. A more useful benchmark is to
assume that all consumers have chance probabilities equal to the brands' actual mar-
ket shares. This benchmark still assumes that the considemtion set is common across
all consumers and that there is no variance in brand probabilities across consumers.
Other benchmark models represent heterogeneity in consumer choice behavior
through the use ofloyalty indices, etc. Kalwani, Meyer and Morrison ( 1994) propose
the Dirichlet-Multinomial (DM) model as a benchmark. The DM-model 's parameters
are estimated from consumers' purchases in the estimation sample.
The foregoing discussion suggests that models with different degrees of behav-
ioral detail and specified at different aggregation levels are best evaluated against
uniquely specified benchmark models.
In Section 16.9 we mention that scoring rules can be applied to evaluate the quality
of an expert's subjective judgment. Such scores are part of the process to validate
subjective estimates. Of course, the output of a subjectively estimated model can
also be validated with the criteria presented in Sections 18.2-18.5. In this section we
consider an approach that focuses explicitly on an expert's ability to make subjective
judgments about output variables such as market share. 30
For simplicity of exposition, we consider a mature market in which industry sales
grow at the same rate as the population. There are two brands (j and c), which
compete primarily through price and advertising. Suppose that market share response
functions were considered for parameter estimation based on historical data, leading
28. For further details, see Urban and Katz (1983) and Urban and Hauser (1993, pp. 468-472).
29. We closely follow Kalwani, Meyer, Morrison (1994).
30. See also Naert (I 975b).
518 CHAPTER 18
Table 18.10 Subjective values ofmarket share for sixteen combinations ofprice
levels and advertising expenditure values.
16 p~ p~ a~ a2c ml.6
J J J
historical data are equal to the parameters that generate the subjective estimates. An-
other possibility is to use the historical data as a validation sample for the parameter
estimates obtained from the subjective market share estimates (see Section 18.5).
We note that when model complexity increases (e.g. as the number of relevant
predictor variables increases), it becomes virtually impossible for humans to learn the
marginal effects of the predictor variables by tracking historical data. The difficulty
humans face is that multiple predictor variables change at the same time, making it
likely that changes in a criterion variable are attributed incorrectly to specific predic-
tor variables. On the other hand, econometric methods are specially suited to provide
the best possible estimate of a given predictor, holding other predictors constant.
In Sections 3.1 and 16.9 we discussed the use of intention surveys to obtain quanti-
tative predictions of the effects of the introduction of private broadcasting on adver-
tising expenditures in other media. The estimates of the advertising expenditures in
medium i for the entire market in period t + 1 (AEi,t+l) are obtained as follows:
AEit ~,
A£i.r+l = --·AE·
AE' ,,t+l
( 18.48)
If
where
AEit = advertising expenditures of the entire market
in medium i in period t,
A E;1 advertising expenditures of the sample 31 in
medium i in t,
~,
The estimates AE i,t+ 1 can be validated by comparing these predicted values with
the actual values, AEi,t+l· The measure we use, is the percentage error of prediction
PEP:
The first component represents the deviation which occurs between intended and
actual behavior within the sample in period t + I. This can also be represented by the
measure PEE (percentage error of actual behavior):
PEEr+! = [ ~I:,t+ 1 -
AE. ]
1 x 100 = [~I
A£.
1' 1
+I-
1
AE. 1
t,t+
I] x 100. (18.51)
AEi,t+l AEi,t+I
The second component of (18.50) is the sampling error, which is defined as the
percentage error of the sample (PES):
PESt+! = AE~ )
[{( ~ - (AE')}/AE']
__jJ_ __jJ_ x 100. (18.52)
AE,,t+l AE~t AE11
This component represents the (percentual) difference between the fraction of the
advertising expenditures of the sample out of the total population in period t + 1 and
the value of this fraction in period t. The larger the difference, the less representative
the sample is in period t + I.
The relation between PEP, PEE and PES (ignoring subscripts) can be written as:
Thus, the validation measure PEP can be decomposed into an error due to deviations
between intentions and actual behavior and a sampling error. Such a decomposition
can provide diagnostic value with respect to possible causes for prediction error. To
illustrate, we use the outcomes of an intention survey to estimate the advertising ex-
penditures of different media in the Netherlands in 1990, the year after the (possible)
introduction of private broadcasting. In Table 18.11 we report the values of PEP, PEE
and PES.
VALIDATION 521
Table 18.11. shows that for the private broadcasting channel the model strongly
overestimated (+43) advertising expenditures. The respondents represented the popu-
lation well (a sample error of -4), so the prediction error was mainly attributable to a
behavior intention error (+49). This intention error was partly due to reduced prices
for commercial time on the private broadcasting channel. For the print media (daily
newspapers and magazines), the respondents also overestimated their advertising ex-
penditures (positive values of PEB), but the fraction of advertising expenditures in
the population decreased (negative values of PES): other advertisers in the population
increased their print advertising more than the advertisers in the sample. This sam-
pling error is larger than the behavior error. Thus the positive intention/behavior error
is overcompensated by the negative sampling error, resulting in a negative prediction
error.
Still, the total advertising expenditures across all media are predicted quite well
(PEP= -6%). However, this is in part because of opposite signs for errors in the two
components. Thus, the high degree of accuracy in predicted total advertising expendi-
tures is due to the fact that the component errors happen to offset. If these errors can be
expected to show similar patterns in the future (i.e. if they are negatively correlated),
then decreases in error in one component should be accompanied by increases in the
other component's error.
PART FOUR
Use I Implementation
CHAPTER 19
• model-related dimensions;
• organization-related dimensions;
• implementation-strategy dimensions.
525
526 CHAPTER 19
5.1. Two aspects of the model-building process deserve further comment: model
scope and the evolutionary nature of model building. We examine these aspects along
with the model's ease of use, another element of implementation strategy, in Section
19.2.
In this section we discuss the personal, interpersonal and organizational factors that
are part of organizational validity. Research on model implementation and on com-
puterized decision support systems (Section 15.2) has identified these factors as being
influential.
19.1.1 PERSONALFACTORS
Models should ideally be custom built, and developed in accordance with the integra-
tive complexity of the model user. Integrative complexity is the ability of an individual
to integrate information on multiple dimensions in a complex fashion. 3 Integratively
more complex individuals can perform higher levels of information processing than
integratively simple individuals. Thus, a model should be developed for a specific
user in such a way that the model fits the manner in which the user makes decisions.
This focus on the model user's personal factors illustrates that it is important
for a model builder to understand the alternative approaches the decision maker has
access to and is comfortable with for making decisions. Thus, a model has to com-
pete with whatever traditional approaches have been used in decision making. For
example, a brand manager who has responsibility for, among other things, price and
promotion decisions may have relied on weekly market status reports which provide
brand performance measures (such as revenues, unit sales and market shares) along
with marketing activities for a set of brands sold in specific distribution outlets. This
manager would try to relate changes in the performance measures to changes in the
marketing activities. Essentially the manager does the equivalent of a multiple regres-
sion analysis with a large number of (potential) predictor variables. We have argued
elsewhere that models are (much) better at finding optimal information integration
rules than humans are (Section 15.2). Purely from the perspective of efficiently in-
tegrating information on multiple variables expressed on disparate scales, humans
wiii not beat models. However, humans have the capacity to recognize when a model
is obsolete (for example, due to important changes in the environment). Thus, the
objective of the model builder should be to recognize the strengths and weaknesses
of the decision maker, and to identify to opportunity to melt the decision maker's
strengths with the model's strengths.
For each individual, there is an optimum level of "environmental complexity"
which maximizes the level of information processing. An integratively complex indi-
vidual reaches this maximum at a higher level of environmental complexity than does
3. See Larreche (1974, 1975).
MODEL IMPLEMENTATION 527
the integratively simple individual. Benbasat and Todd ( 1996) introduced the notion
of elementary information processes (EIP's). EIP's are the basic building blocks of
a decision strategy. These are low-level cognitive operations such as reading a value,
combining two values, or storing a result in long-term memory. EIP's have been used
to model a variety of decision processes. 4 To illustrate, the EIP's used in (preferential)
choice strategies represent three classes of decision making effort: 5
• the processing effort associated with calculations and comparisons within and
among alternatives;
• the recall ~!forts associated with the retrieval of information;
• the tracking effort, associated with the storage and subsequent retrieval of infor-
mation about alternatives.
A model builder can identify the EIP's in a specific decision-making situation to de-
termine how a model can be designed which reduces these efforts for the model user.
For an evaluation of model advantages with respect to effort, the model builder can
consider both the effort required ofthe decision maker to interact with the model and
the effort required to process the information generated by the model. The likelihood
of implementation of the model partly depends on a comparison of efforts required
for model use and efforts required in the traditional approach.
Successful implementation also depends on "user involvement" and "personal
stake". 6 Not surprisingly, greater user involvement leads to higher implementation
success rates. 7 User involvement is also central to the degree of interaction between
developer (researcher) and user (manager). The more involvement between the two
sides in terms of quantity and quality of interaction, the more likely it is that mutual
confidence develops (see below).
Personal stake is the extent to which a model user's future performance depends
on the model and its use. If the model can be shown to improve the quality of a
manager's decision, and if improved decision making leads to better performance,
then the model is more likely to be implemented.
The personal characteristics of the decision maker that affect implementation
success8 include general intelligence, work experience, length of time at the company
and in the job, education, personality, decision style and attitude toward a model.
The role of these factors in influencing model implementation is described by Zmud
( 1979). Zmund found that extroverted, perceptive individuals possess above-average
positive attitudes toward models while older and less educated individuals were ob-
served to exhibit less positive attitudes.
A distinction is often made between analytic and heuristic problem-solving styles.
The analytic style is relatively formal and systematic in nature. Also, analytic decision
4. For example, see Johnson, Payne (1985) and Payne, Bettman, Johnson (1988).
5. We closely follow Benbasat and Todd ( 1996, pp. 245-246) but omit much detail.
6. Swanson (1974), Schultz, Ginzberg, Lucas (1984), Hanssens, Parsons, Schultz (1990, p. 327), Lucas,
Ginzberg, Schultz ( 1990).
7. Schultz, Slevin (1983).
8. See also Wierenga, van Bruggen ( 1997).
528 CHAPTER 19
makers have a fondness for data and analysis. The heuristic style is less formal and
more ad hoc. The heuristic decision maker also places more value on intuition and
experience. Experiments have shown that heuristic individuals are less likely to accept
recommendations from model builders than analytic decision makers. 9
Churchman and Schainblatt ( 1965) proposed the following matrix to represent four
distinct views of the model user (MU)- model builder (MB) interface.
"MU understands MB" means that the MU reacts to what the MB is trying to do in
a manner that improves the manager's chances of succesfully exercising the duties
assigned to the MU. We discuss distinct elements of these views below.
1. Mutual understanding
This position represents an ideal set of characteristics. The model user understands
the pro's and con's of making decisions based on model output, and the model builder
knows the various perspectives considered by the model user. The mutual understand-
ing should lead to increased confidence about the outcome of the model-building
process, and this will facilitate acceptance and use of the model.
2. Communication
In this view, the model user directs the model builder. Essentially, the user has decided
that a model should improve the quality of decisions, and the user has identified
at least conceptually the structure to be used by the model builder. However, the
model builder will not have a complete understanding of the user's perspectives. Of
course, it is critical that the user has a good understanding of elements from statistics,
econometrics and operations research. The model builder depends on extensive and
thorough communication of the model user's needs and expectations.
3. Persuasion
This view applies when the model user is uninformed about the role of models in
decision making. The problem of model implementation is then one of the model
builder selling the features of the model. Among the drawbacks of this position is that
model advocates often promise more than can be delivered. This excess in promise
can show both in the model not fitting the decision making context and the model not
performing at the promised level of accuracy.
The underlying rationale for this position is that model users (managers) are too
busy to learn about models and do not have the patience to discuss details relevant to
model development. It is then the task of the model builder to understand the manager
well enough for the model to be accepted in principle and upon model completion
based on the superiority of results. The persuasion task will require that the model
builder understands the personality of the manager so that resistance to change can
be overcome.
4. Separatefonctions
In this view the functions of model builder and model user are essentially separate and
separable. The model builder has the responsibility of generating a workable model.
This model may be intended for use in a large number of settings. Once the model is
completed, its purpose, its function, and its results will be presented to managers who
either accept or reject its use. A modest amount of customization may be provided,
dependent upon the heterogeneity in user needs, data characteristics, etc.
Based on several studies, 10 it appears that the mutual understanding position is the
most effective interface. This should be true especially if the entire model-building
exercise takes place within an organization. The communication position may char-
acterize situations in which managers oversee a group of individuals hired to provide
model-building expertise. The last two positions represent cases where, for example,
consultants or market researchers offer their services for a fee. Especially when the
models are intended for widescale application, the supplier may work with leading
users to make the model characteristics and model output fit their environments. Suc-
cessful use in highly regarded or leading organizations then facilitates the diffusion
of model use throughout an economy.
The first two positions, mutual understanding and communication, are also charac-
terized by a relatively high degree of user involvement. In Section 19.1.1 we defined
user involvement as the degree of interaction between user (manager) and builder
(researcher). In the context of model building, user involvement depends on variables
suchas: 11
• the user's understanding of model-building arguments and model characteristics;
• the user's evaluation of the quality of a model and its supporting mechanisms
(people, hardware, data);
• the user's knowledge about approaches for conflict resolution (between partici-
pants in model development).
Factors that inhibit user involvement and prevent facile communication among the
participants stem from fundamental differences between the personal and other char-
lO. Dyckman (1967), Duncan (1974), Benbasat, Todd (1996).
11. See Hanssens et al. ( 1990, p. 327).
530 CHAPTER 19
acteristics of model builders and model users. Hammond (1974) identified eight di-
mensions on which differences show. Managers and researchers differ in goal orienta-
tion, time horizon, comparative expertise, interpersonal style, decision style, problem
definition, validation of analysis and degree of structure required. We briefly discuss
these dimensions below.
1. Goal orientation
The model builder is motivated to pursue a goal of developing a model that produces
valid and reliable results. In that regard, the model builder may want to impress
model-building peers which can result in the model having a higher degree of sophis-
tication than is desirable. By contrast, the manager (model user) will pursue personal
goals that may be oriented towards obtaining a promotion. Thus, if the model allows
the user to achieve superior performance in the short run, the manager should be
favorably inclined. Hanssens et al. (1990, p. 327) mention that there is substantial
evidence that the impact of a model on the user's job performance overshadows all
other factors in predicting model acceptance.
2. lime horizon
The time horizon considered by managers for a model to assist in problem solving
and decision making is generally shorter than the time horizon of a model builder.
3. Comparative expertise
A marketing manager is more familiar with the substantive aspects of marketing than
the typical model builder is. The model builder is an expert on the methodology
of model building. Although such differences in expertise may produce synergistic
effects, these may also be counterproductive because of difficulties in communica-
tion. The jargon of the model builder may indeed be quite different from that of the
manager.
4. Interpersonal style
Model builders are more likely to have a task-oriented interpersonal style. They will
work together with others until the task is completed, whereas many managers have
a more relationship-oriented style. Managers are interested in maintaining good rela-
tionships with peers and others, relatively independent of specific tasks.
5. Decision style
On an analytic-heuristic continuum, the model builder tends to be close to the ana-
lytic end, while the model user is close to the heuristic end of the scale. Even if the
model user accepts the model builder's recommendations, the person with a heuristic
problem-solving style is often disinclined to actually use the model results.
6. Problem definition
The typical model builder likes unambiguous and explicit problem definitions. Model
builders also tend to limit themselves to those dimensions that are easily quantifiable.
MODEL IMPLEMENTATION 531
The manager's problem definition is often vague and will include a number of quali-
tative considerations.
7. Validation ofanalysis
The model builder may be more interested in validating model structure and inputs,
whereas the user is primarily interested in the output, i.e., how well the model per-
forms. It is conceivable that the evaluation of model structure/inputs on the one hand,
and model output on the other hand, converge. A short-run orientation on the part of
the manager may, however, favor the use of a model just because the output appears
promising. The longer-term orientation of the model builder requires that the model
is structurally sound so that it can be expected to produce useful results in other time
periods and under other conditions.
It should be clear that it is important for the model builder to be aware of such dif-
ferentiating characteristics. These eight dimensions give operational meaning to the
notion of"understanding" defined by Churchman and Schainblatt ( 1965). A thorough
consideration of these differences should enable the model builder to develop an im-
plementation strategy that reduces or eliminates these barriers. The ideal position of
''mutual understanding" is one that may develop over time. Because of rapid increases
in the nature and size of databases, managers can no longer operate in a traditional
mode. Thus, managers must learn to interact competently and intelligently with model
builders. By the same token, to be effective, model builders must understand the
components of organizational decision making. Schultz and Henry (1981, p. 290)
advocate the use of functionally related intermediaries to guide this process. If the
model user and model builder cannot interact effectively, a market researcher may
provide a useful interface.
Recent research affirms that the key issue for a model developer is the design of
a system which demonstrably moves the decision maker toward an approach that is
expected to provide more accurate outcomes. 12 Thus, the model builder will benefit
from insights into the nature of information processing by the model user. A full
understanding of the decision making context will suggest opportunities for the model
builder to develop meaningful decision aids.
"The evidencefor the need ofmanagement support is so strong that any attempt to
implement a system without it, and without the related conditions ofcommitment
and authority to implement, will probably result in failure."
failure. Thus it is crucial that reward systems are properly aligned, and that the ar-
guments in favor of combining subjective judgments with model output are spelled
out.16
Resistance to change may be overcome through a program of education and
preparation. This is what underlies Little's ( 1975b, p. 657) suggestion to start a model-
building project by first having management attend an orientation seminar on the
state-of-the-art of marketing models and model building. This should give managers
a better understanding of what models can and cannot accomplish. It may also reduce
their resistance to change. 17
McCann and Gallagher (1990, pp. 217~219) describe several other impediments
to the implementation of expert systems 18 in an organization. Expert systems that
contain the marketing expertise of manager i may not be desired by manager j. For
example, if manager j were to use the expert system and achieve favorable results,
credit for these results may go to manager i.
An impediment to the development of an expert system is that if managers allow
their knowledge to be placed into a computer, they are "selling" that knowledge to
be used for an indefinite time period by the firm. Managers, however, are not paid
for selling their knowledge but for "leasing" the knowledge for the duration of their
employment. If managers place their expertise into a computer, the traditional em-
ployee/employer relationship breaks down. Also, the revelation of this expertise and
knowledge, and the lack of knowledge, is a potential barrier to the codification of
expertise. Top management has to provide incentives for managers to be partners in
this area. Top management support is probably the most critical organizational factor
for model implementation
The second organizational factor is the match between model structure and organiza-
tional structure. An often-voiced criticism is that models are partial representations
of reality. For example, advertising budgeting models generally miss explicit consid-
eration of media allocation, and media planning models often take the advertising
budget as given. To see whether this is a serious pitfall, it is important to take the
structure of the organization into account. If advertising budget decisions and media
allocation decisions are made at different levels in the organizational hierarchy, then
model implementation will be facilitated if the advertising budget model omits media
allocation considerations. We elaborate on this point in Section 19.2.2.
The third organizational factor relates to the position of the model builders (or the
information systems unit) within the organization. The following characteristics have
been found to determine model acceptance: the size of the unit, its internal structure,
16. For an example of how model output and subjective judgments may be combined, see Blattberg and Hoch
(1990), Gupta (1994) and Section 16.9.
17. See also a case study reported by Montgomery, Silk and Zaragoza (1971) concerning the introduction of
their model DETAILER into a firm. Other examples can be found in Cooper, Nakanishi (1988, Chapter 7) and
in McCann, Gallagher ( 1990, Chapter 13).
18. See Section 15.2.
534 CHAPTER 19
the organizational and technical capabilities of the group, its reputation, the life cycle
stage of the unit and the place of the unit in the organizational structure. We consider
here only the last two points, the others being largely self-explanatory.
The marketing science (model-building) unit's life cycle stage affects implemen-
tation (Schultz, Henry, 1981, p. 289). In early stages, when strong organizational
support is not yet present, implementation is more difficult than in later stages, when
successful performance should create positive word-of-mouth and requests for other
models from satisfied users.
In some firms marketing science projects are completed in the market research de-
partment. Alternatively, this department is responsible for the purchase of models
from suppliers such as ACNielsen, IRI, GfK, Research International, etc. The latter
situation pertains especially to smaller firms. 19 In those firms marketing science may
be a one-person operation. A marketing scientist/model builder is likely to report to
a product manager or marketing manager. Occasionally the model builder reports
directly to the vice president for marketing. The involvement of higher levels in the
organization occurs especially when models are developed on an ad-hoc basis to solve
a specific problem. Examples are (possible) decisions to increase the price above
a critical value (e.g. from below $5 to about $5), to change the advertising budget
dramatically, to modify a loyalty program or to model the influence of a mishap on
brand performance (see Section 17.3).
19.2.1 INTRODUCTION
An effective implementation strategy is one that reflects the arguments that increase
the likelihood of model acceptance (Section 19.1). For example, a successful im-
plementation strategy is more likely if it is based on the ''mutual understanding"
position than if it is based on one of the other three positions in the Churchman-
Schainblatt framework (Section 19.1.2). And if we consider the model user to be the
client of the model builder, marketing prowess dictates that the model builder has
to understand the user's decision-making environment. The more the model builder
knows about the user's way of thinking and the user's needs, the easier it will be
to convince the client of model benefits. The "persuasion" position is, in this sense,
second best. Educating the client is, of course, also much easier if one knows the
client's situation. An educational effort must include time to provide management
with at least partial knowledge of the process of model building and the pro's and
con's ofmodels. 20 We note that the learning aspects of management involvement in
model building have been strongly advocated by various authors (Urban and K.arash,
1971, Urban, 1972, 1974 and Little, 1975b). Their proposed implementation strategy
scratch each time a change occurs. Several authors, therefore, have argued for mod-
ular model building. Modularity means that a desired model is obtained by putting
together a set of submodels. For example, Little's (1975a) BRANDAID marketing
mix model consists of a set of modules, one for each marketing instrument. Thus,
if the user does not want to include promotion variables, that module is simply not
included in the model. If later on the manager decides to include promotion in the
marketing mix, one does not have to start the whole model-building effort anew: the
promotion module would be linked to the existing structure. Another example of a
modular model is a marketing decision support system for retailers (Lodish, 1982)22
where multiple marketing mix variables such as (national) advertising and retail mark
up are part of the sales-response function.
In Section 19.1 we discussed the idea that models should ultimately reduce "the
efforts" of the user. Benbasat and Todd (1996, p. 251) expect that if the effort is
reduced through the use of decision aids (a model can be seen as a decision aid)
or through other means such as training and experience, the model builder will also
be more willing to formulate more complex models. In this way the model builder
moves the decision maker toward a strategy which may provide increasingly accurate
outcomes.
We discussed the SCAN*PRO model developed by Wittink, Addona, Hawkes
and Porter (1988) in several sections. The original model is defined in Section 9.3,
and we introduced a simplified version of the model in Section 14.1.2. More complex
and more realistic SCAN*PRO-versions were developed by Foekens, Leeflang and
Wittink (1994) as part of an evolutionary model building process. The changes in-
volve relaxation of the homogeneous parameters assumption. Chain-specific (hetero-
geneous parameters) were shown to provide better fit and better forecasting accuracy
(Section 14.1.2). In Section 16.7 we discussed a SCAN*PRO model estimated by a
semiparametric method to avoid inappropriate constraints on functional forms. This
evolution is appealing in the sense that a user may favor an initial version that is fully
parametric because of its simplicity.
Other evolutionary versions of the SCAN*PRO model are:
The leads and lags are critical so that the expanded model allows sales effects from
promotions to be decomposed into, for example, brand switching, stockpiling and
other effects. 23 We summarize this evolutionary model-building process in Figure
19.1.
Not only the users, but also the model builders need training. Mutual understanding
can be realized only if the model builders are also exposed to a formal treatment
l~
• SCAN*PRO model (9.19) (homogeneous parameters)
of the implementation problem (Schultz, Henry, 1981, p. 293). For example, model
builders should learn about the behavioral and political realities of implementable
model building. Thus, the model builders can learn from behavioml training, that is
education aimed at increased awareness of individual and social actions. Schultz and
Henry ( 1981) maintain that management scientists should know more about the art of
management and behavior. They suggest that:
Thus far, evolutionary model building has been proposed as a process of gradually
moving from a relatively simple representation of a given problem to a more complex
representation. There may, however, also be an evolution in the types of problems
being modeled. Urban (1974, p. 9) found that evolutionary progress often means
not only model changes, but also the identification of new model needs. Schultz
and Slevin ( 1977, p. 16) mention that the chances of successful implementation are
seriously impaired if the model building is very advanced and the user has little or
no experience with models and model building. In that sense, early model-building
efforts might concern well-structured problems, which are easy to systematize. After
positive experience with such simple problems, the model building gradually evolves
toward relatively unstructured problems. Examples are problems of a strategic na-
ture. 24 The most serious problem with marketing strategy models is that such models
24. Surveys of strategy models are given by Wind ( 1981) and Wind, Lilien ( 1993). Other examples can be
found in Zoltners ( 1982).
538 CHAPTER 19
are rarely used by management (Wind, Lilien, 1993, p. 776). Most of the strategy
models are not user-friendly, do not address key concerns of top managers, do not
facilitate the process of making strategic choices and are more directed to brand
strategies than to corporate strategy.
In this section we first examine the global versus local model-building controversy.
We then discuss general versus detailed descriptions of marketing variables and hier-
archical linking (the linking of different decision points).
Model scope is seen as an element of implementation strategy because it relates
to the matching of model structure to organizational structure.
organizational setup, and management may prefer to maintain the links informally.
Second, even if the linking of local models into a global model were desirable, the
model builder may face technical difficulties constructing the linkages.
The business world has embraced the notion that the functional areas of the firm, such
as marketing and production, should not act as independent units. Increasingly indi-
vidual activities, nominally belonging to different functional areas, are coordinated
and sometimes integrated. 26 Consider, for example, a consumer-goods firm that is
contemplating an increase in promotional activities. If the marketing function acts
independently, it would consider the effect of increased promotions on consumer de-
mand. With appropriate data, one can estimate the gross effect and decompose it into
sources of demand increases such as brand switching, stockpiling and consumption.
Brand switching is often the largest component, but in a competitive market, one
expects an increase in switching to one's own brand to be followed by an increase
in switching to other brands (i.e. we expect competing brands to react, see Section
11.2). Thus, the net effect from brand switching may be very small. Also, the percent
of the gross effect due to stockpiling tends to be ephemeral. Unless the stockpiling
increases consumption (due to large inventories) or the stockpiling keeps consumers
from considering alternative brands, this effect should also be very small. Still, it is
conceivable that the marketing department concludes that promotions have beneficial
effects on results exclusive of the impact on other operations.
If we jointly consider marketing and production, then we will also recognize the
effects of promotions on the cost of production (and inventory, and distribution). The
gross effect of a promotion on sales is often very large, which creates the illusion that
favorable results are achieved. If we only consider the marketing aspects, we may
conclude that the net effect is positive as well. But if we take production into account,
we would also deduct the (extra) costs due to the additional variation in demand.
Typically, it is harder to forecast demand in the presence of a promotional activity than
in its absence. And distribution also becomes less efficient. When all these extra costs
due to supply chain inefficiencies are subtracted, it may be unlikely that we can justify
(increased) promotional activities. 27 Thus, decisions about marketing activities, based
on global optimization may differ strongly from those based on optimization at the
subunit level.
The logic of coordinating or integrating decisions across functional areas is also
reflected in models that link marketing decisions to other functional areas. Eliashberg
and Lilien (1993, p. 12) mention that they expect "interface modeling" (modeling that
spans functional areas) to receive an increasing amount of attention. 28 Examples of
research in which marketing is linked to at least one other function are listed in Table
19.1.
The area where most progress is made is the marketing-production interface.
Examples include models of quality function deployment (relating customer-based
preferences to internal engineering specifications29 ) and models for just-in-time (JIT)
management and total-quality management (TQM). 30
variables in general or broad terms (e.g. Gross Rating Points for advertising or percent
of outlets in which the item is sold for distribution), one could decompose the variable
into its constituent parts and use those components as separate predictor variables in
the model.
There are two principal reasons why many models do not simultaneously con-
tain both general and detailed measures of marketing variables. One reason is the
complexity of modeling. The other reason has to do with the organizational structure.
1. The complexity of having both general and detailed measures of marketing vari-
ables in a model is illustrated by the multiproduct advertising budget model of
Doyle and Saunders (1990) discussed in Chapter 13. An example of a model
with greater complexity is Pedrick and Zufryden ( 1991 ). In this model the impact
of advertising media plans and point-of-purchase marketing variables on con-
sumer demand is estimated by integrating purchase incidence, brand choice and
advertising exposure behavior.
The complexity of models that include detailed measures along with general
descriptors of marketing instruments can be reduced in part through modular
model building. Examples of modular models are BRANDAID and ASSESSOR
(Section 10.2). Modular model building is likely to be an important approach in
the near future. 31
2. Decisions about a general measure are often made before detailed decisions are
formulated. Thus, a budget is usually determined before decisions are made on
a campaign theme and the media. This sequence may reflect the organizational
structure of the firm for which the model is developed. Thus, at one level in the
firm (e.g. the director of marketing), decisions are made about the advertising
budget for next year. At a lower level of the organization, the budget is an input
(it is fixed) and a model may be used to allocate the budget to different media.
I. Model with
general
measures
2. Model with
detailed
measures
j j j j
Media plan I Mediaplan2 Media plan i
Figure 19.3 Sequence of outputs from and inputs into models with general and detailed measures.
We note that this type of iterative process links top-down and bottom-up approaches
described in Section 13.5.
The "easy to communicate with" characteristic has many aspects. One is that model
builders should be able to communicate their ideas in a manner that fits the model
user. Thus, the model builder must have the user's perspective in mind to enhance the
likelihood of successful model implementation.
Easy communication also means that it should be easy for the user to specify
model inputs, and for the model to provide output quickly. The output should be
constructed in a manner that suits the user. On-line conversational input-output and
interactive computer systems are seen as being effective in bringing about ease of use.
On-line computer systems:
• reduce barriers between model and user;
• aid the learning process by immediate response;
• make immediate availability of information possible;
• encourage the user to examine a large number of possible plans;
• can be made available to a number of different users.
The degree of interaction afforded by a decision aid is often an important precursor
to usage. And active participation in model development, including the construction
544 CHAPTER 19
of a frame for output measures by the user, tends to lead to increased commitment,
acceptance, support and use. 32
32. See, for example, Huse ( 1980), Cats-Baril, Huber (1987), Barr, Sharda ( 1997).
CHAPTER 20
The development and use of models is justified if the (expected) benefits exceed the
(expected) costs. For a firm that has adopted a model, this implies that profit with the
model should be greater than without it. Of course this is not easily operationalized.
After model adoption, we can quantify the benefits and make a comparison with the
costs incurred. Before implementation of a model, both benefits and costs are based
on potentially highly uncertain estimates.
In this chapter we start with a situation in which models are absent. In such a
case we may use general arguments in favor of and against models, relative to the
use of judgment for decisions (see Section 15.2). This ex-ante comparison includes
the arguments that a model is consistent (a static model always provides the same
output given the same input) and that it allows for inspection of the process used to
generate output. By contrast, a manager's judgment tends to be sensitive to irrelevant
variables and tends to overemphasize the most recent data. However, the manager
can recognize that the market environment changes systematically, and adapt the
decision-making process to reflect changing market conditions. Most models do not
have this capability, although it is possible for model builders to incorporate dynamic
elements in the models. 1
Given that models should be adapted as our understanding of markets changes
and as we gain access to more complete and more detailed data, it is important that
we use a conceptual approach for the consideration of costs and benefits before model
development. We do this in Section 20.1. In Section 20.2 we focus on cost components
of model development and model use. We discuss how the benefits can be determined
in Section 20.3. In Section 20.4 we present examples that illustrate how costs and
benefits vary with the type of problem, the size of the firm, intended use of the model,
and the amount of behavioral detail. We end with general observations of costs and
benefits in Section 20.5.
545
546 CHAPTER20
20.1 Tradeoffs
An ex-ante comparison between the use of a model and not using one is necessarily
to a large extent conceptual. Mitchell, Russo and Wittink ( 1991) suggest that models
beat human judgments, on average, in case of repetitive decisions that require the
integration of data on multiple variables, often expressed in noncomparable scales
(see also Section 15.2 and Section 16.9.1). This is largely because humans vary their
judgments even if the objective data are identical. Even a model of a decision maker's
judgments typically outperforms these very judgments! The reason is that the model
uses only the systematic part in the decision maker's judgments (and eliminates the
unsystematic or random part). However, to the extent that the decision maker's judg-
ments are systematically wrong (e.g. overweighing some and underweighing other
variables), a model based on actual outcomes will do even better.
An argument against any model of actual outcomes is that the model explains the
past, but will not necessarily be applicable to the future. Model builders and model
users must, therefore, assess the extent to which changing market conditions may
reduce the validity of the model specification or the model parameters. Thus, one
should compare the benefits that emanate, for example, from the model's consis-
tency against the cost of missing dynamics. We note, however, that an evolutionary
model-building process can reduce a model's shortcomings through the addition of
appropriate extensions over time. The two most obvious extensions are:
The first possible extension is attractive if there is a logical relation between the
effects of marketing variables and measures of economic and other market conditions.
The second option will work well if there are activities that are difficult to quantify
but the decision maker has sound intuition about the influence of those activities.
Benbasat and Todd ( 1996) use a cognitive cost-benefit approach to examine how the
selection of decision strategies depends on the interaction between tasks and decision
aids. Decision aids are designed to reduce the effort required by the decision maker for
the evaluation of alternative decision strategies. 3 Benbasat and Todd ( 1996) suggest
that a decision maker, in choosing a strategy, weighs two major factors:
The costs of model utilization are important for the model builder and the model
user. If the perceived or actual cost of using a proposed model is excessive, it may be
advisable to formulate a simpler model. The model builder should focus closely on
the effort required for model use, because barriers to model use will lead to rejection
of the model. Counterproductive use may occur if the model is more complex than
needed. Model complexity by itself, however, does not have to deter the user. Most
users do not need to see the model details and they often do not want to see the
complexities. Rather, they want a model that provides output they can understand
and relate to. This output should be valid and reliable, and if the model allows the
user to play relevant "what if" games, the user should not have to confront the model
complexities.
b. Maintenance costs
Model development is not a one-shot event. Maintenance costs relate to updating the
model, such as, changing its structure, updating the parameter estimates, etc. These
costs will be partly fixed, partly variable, in the sense that the frequency of structural
change will depend on use intensity as well as dynamics in the marketplace.
• managerial time;
• the effort required of the decision maker to interact with the model, 5 and
• the effort required to process the information generated by the system.
The managerial time required for model development and model use needs to be
considered as a cost. This time may be assessed in terms of the manager's salary.
This time cost can be compared against the possible reduction in time that results
from having the model do part of the job the manager used to do. For example, man-
agers may spend less time on programmed and structured activities such as inventory
management, media allocation, sales force management and control, and judging the
consequences of alternative marketing programs. They will have more time available
for unstructured activities such as the creation of entirely new marketing programs.
This reallocation of time can result in important benefits.
The efforts required for interacting with the model and processing the output
generated depend on the cognitive abilities and the experience of the user. However
the model builder can facilitate the interaction by creating output in a user-friendly
manner. Experience has shown that the user's efforts can be reduced through training
and the availability of decision aids.
In Section 3.2, we made a distinction between direct benefits and side benefits. Direct
benefits are the improvements in decisions that result from model use. Side benefits
are those generated from model use that were not intended or expected.
In this section we discuss the measurement of direct benefits. The quantification
of benefits is difficult for a number of reasons.
a. Often the benefits of a model are determined on the same data used for model
testing and parameter estimation. Such a comparison between situations in which
the model is used versus not used would be biased in favor of the model as il-
lustrated in the multiproduct advertising budgeting model of Doyle and Saunders
(1990), discussed in Section 13.5.
b. For normative models it is possible to compare marketing decisions based on
model output with the decisions that would have been made in the absence of a
model. However, for descriptive models, including demand models, no optimal
decisions are implied. One could propose to compare the estimated parameters
with what managerial judgments would have generated, but such a comparison
does not quantify the (possible) benefit from model use. If an estimated demand
COST-BENEFIT CONSIDERATIONS IN MODEL BUILDING AND USE 549
model is used for decision making, then it should be possible to compare the
quality of the decisions with what would have been done without the model.
Obviously, these comparisons are not straightforward.
c. In all instances in which comparisons are made, the implicit assumption is that
the model represents reality. In a limited way this is a testable proposition. The
other way to think of it is that the benefits cannot be determined until after deci-
sions have been made (or could have been made) based on the model. Indeed, we
advocate that this be done repeatedly so that the model's limitations get clarified.
The general form of the expected profit function is shown in Figure 20.1. For low
values of x, the probability of winning is high but expected profit is negative. For
example, for x = 0, the probability of winning should be one, and expected profit
would equal-c. If x = c, the expected profit is zero. As x increases, P(x) decreases,
and at some point, (x =X! in Figure 20.1), P(x) and hence rc(x) is zero. The main
problem in competitive bidding models is to estimate how P (x) depends on x. Once
P(x) is specified, one can obtain the value of x, x* in Figure 20.1, that maximizes
expected profit. 7
To evaluate the benefits derived from this model, we compare the performance
ofbids with and without the model. Edelman (1965) provides the data reproduced in
Table 20.1. Results are shown for seven bid situations. The model was available to the
team responsible for preparing the bid but it made no use of it for situations 1 through
6. Instead, the team applied traditional company procedures for bid preparation. The
data in Table 20.1 show that the bid generated from the model was lower than the
6. Price is of course only one of the elements that affect the outcome of a bidding situation. For example,
unreliable contractors may not be awarded the contract, even if their bids are the lowest. For a further discussion,
see for example Haynes and Rothe (1974), Lilien, Kotler, Moorthy (1992, pp. 208-212).
7. While (20.1) is the bidding relation often used in the literature, it has an important shortcoming. It assumes
that the probability of winning is not related to whether the calculated cost under- or overestimates the true cost.
550 CHAPTER20
Expected profit
rr(x)
Bid price(x)
-c
Figure 20.1 Expected profit as a function ofbid price.
lowest competitive bid in 5 out of 6 cases. The actual bid made (bid without model)
was lower than the lowest competitive bid in only 2 out of 6 cases. Thus, the model
would have provided a much higher success rate.
The other relevant aspect is the difference from the lowest competitive bid, if the
bid is successful. For the two cases in which the bid without model was successful, the
bid with model would on average have been only 2.8% below the lowest competitive
bid (versus 7.4% on average for the actual bids made). These results suggest clear
benefits of the model.
For the seventh situation, there is unfortunately no bid price without the model.
The team preparing the bid was favorably impressed with the model's performance,
based on the first six situations, and had gained sufficient confidence in the model to
abandon the traditional method. Although it is laudable that the model is accepted
and used, given the favorable performance, it is unfortunate that the team does not
continue to create competitive bids without the model so that the performance of the
model can still be tracked relative to alternative methods. Such comparisons are useful
for learning purposes. Management has the opportunity to gain further insight into the
model's relative benefits. Also it is important to note that once one firm has become
more successful in the bidding process, other firms either disappear or modify their
bidding processes. One possibility is for competitors to simply subtract some amount
from whatever bid is generated by their traditional processes. Another possibility is
that they adopt a similar model. Such competitive reactions will modify the benefits
of the bidding model and require adaptation.
COST-BENEFIT CONSIDERATIONS IN MODEL BUILDING AND USE 551
Bid without
model: per- Bid with model:
Lowest cent under percent under
Bid without Bid with competitive (over) lowest (over) lowest
Situation model model bid competitive bid competitive bid
2. New-product forecasting
The following example shows a somewhat different way of assessing the value of
a model. It illustrates how validation can provide insight into how well a model
represents reality.
In Section 10.2 we discuss the new-product forecasting model for durable con-
sumer goods developed by Bass (1969b). The number of initial purchases at timet is
a second degree equation in Nr-1, the cumulative number of initial purchases at time
t - I; see (I 0.18). The estimates of this relation are used to compute the total number
of adopters (N), the time of peak adoptions (t*), the coefficients of innovation (p)
and imitation (q ), and the "peak sales" N (t*). We show actual and predicted sales in
Figure 10.5.
One might conclude that the model provides accurate predictions. But in this
example the sample data that were used to estimate model parameters include the
time and magnitude of peak sales. The parameter estimates are then used to predict
the time and magnitude of peak sales. Of course a model that accurately describes the
phenomenon at hand should fit the observed data well. But a good fit to the same data
used for parameter estimation is not a guarantee that the model will be valuable to the
decision maker.
A real test is to examine whether the model predicts the time and magnitude of
peak sales before these occur. Bass did this for color televisions. Based on sales of
0.7 (million) in 1963, 1.35 in 1964 and 2.50 in 1965, he predicted peak sales of 6.7
million in the U.S. in 1968. He found that these predictions of the time and magnitude
of peak sales were quite accurate. Such predictions outside the sample data used for
parameter estimation can provide true tests of a model's value.
552 CHAPTER20
8. See, for example, Bult (1993a, 1993b), Wedel, DeSarbo, Bult, Ramaswamy (1993), Bult, Wansbeek (1995),
Bult, Wittink (1996), Goniil, Shi (1998), van der Scheer (1998), Spring, Leetlang, Wansbeek (1999).
9. Kass ( 1976).
10. See Shepard (1990).
II. See Banslaben ( 1992).
COST-BENEFIT CONSIDERATIONS IN MODEL BUILDING AND USE 553
Net returns
CH-AID 81 95,025
Gains chart 100 117,761
PM 108 127,220
people selected. Their approach is an improvement to the gains chart analysis because
they analyze the response to a mailing and determine the optimal cutoff point at the
individual household level. In their PM-approach the optimal cutoff point is obtained
by equating marginal costs to marginal returns.
In an empirical application the CH-Am-approach, gains chart analysis and the
PM-approach are compared in terms of net returns to a direct mail campaign. The
sample data consist of 13,828 households targeted by a direct marketing company
selling books, periodicals and music in the Netherlands. The criterion variable in
this application is the response to a book offer. The data are divided in two parts of
equal size: an analysis and a validation sample. The expected net returns for the total
direct mail campaign, estimated by using the cutoff level of each of the three selection
procedures, are given in Dutch guilders (NLG) and as an index, relative to the Gains
chart, in Table 20.2.
The net returns are based on the validation sample and are projected to the total
number of households in the "houselist" which is about 600,000. Table 20.2 shows
that the PM-approach is expected to generate an extra profit of 9,459 Dutch guilders
or an increase of 8 percent relative to the Gains chart analysis.
The tradeoff between costs and benefits varies with a number of characteristics. We
consider how the type of problem and the size of the firm, relative to the level of
model detail may influence this tradeoff.
1. The relation between the type ofproblem and the amount ofbehavioral detail
First consider the problem of determining the marketing mix for an established pro-
duct in a mature market. The models used for such problems typically do not have
behavioral detail. Adding behavioral detail to these models may contribute little to
the benefits. In fact the addition of detail might have a negative effect, because we are
uncertain about the behavioral process that leads to a sale, and the input requirements
are quite severe. Under some conditions it is conceivable that the positive difference
554 CHAPTER20
2 3 - - - - - - nE n Numberofapplications
between benefits and costs is largest for models with no behavioral detail, for example
models of aggregate data applicable to relatively homogeneous markets. Models of
aggregate data require relatively modest costs. Yet their accuracy is high as long as
the validity of conditional forecasts is not compromised. Since models of aggregate
data cannot capture all relevant complexities of marketplace behavior, these models
may show increasing discreprancies between forecasts and actual behavior as the
marketing actions undertaken deviate more from the characteristics of sample data.
Now consider the development of an "early warning" system for a new product. In
this case, the benefits from an aggregate response model are low because such models
lack the detail and the dynamics relevant to a new-product situation. With more detail,
the model should become more useful, up to a point. The ASSESSOR-model, a model
with some behavioral detail, may be close to optimal from a cost-benefit point of
view: it appears to provide accurate predictions and yet the model is not very complex
nor excessively demanding for data collection. Importantly, an internet-based version
of ASSESSOR, with three-dimensional displays of product displays can reduce the
cost and increase the speed of data collection enormously.
Benefits
Cost
(finn B)
Figure 20.3 Costs and benefits for a large firm (firm B) and a small firm (firm A).
With regard to the amount of behavioral detail, consider two firms A and B. A is a
small firm that introduces few new products per year (less than n E in Figure 20.2).
Company B is large and introduces more than n E new products each year. For each
product introduced, the cost curve for model use as a function of amount of behavioral
detail should then be lower forB than for A, as illustrated in Figure 20.3. It follows
that the optimal level of detail will be greater for large than for small firms. Similarly,
for a firm with its own new-product model, the optimal level of detail increases as
the number of model applications, i.e. the number of new-product introductions, in-
creases.
Of course the decision whether to build or to rent a model also depends on other
factors. While large firms can justify the cost of building a model much better than
small firms can, it often is advantageous even for the largest firms to rent. The reason
is that market research and consulting firms can apply a given model very frequently
across a wide variety of conditions. Thus, the suppliers of new-product forecasting
models can utilize this experience in any application. For example, the real-world
successes of new products can be linked to predicted outcomes based on new-product
model applications. The performance accuracy can be determined separately by type
of product category, type of target market, etc. Only the outside suppliers of new-
product services can generate the large databases that provide useful benchmarks.
Thus, market research and consulting firms can validate a given model quickly and
frequently. This opportunity also facilitates model adaptation both to characteris-
tics that pertain to individual markets and to characteristics pertaining to a given
market that change over time. In this manner model suppliers have an opportunity
to understand contingencies and dependencies. In addition, since they have to con-
556 CHAPTER20
vince many different audiences of the model's applicability, they also confront the
model's strengths and weaknesses. Therefore even the largest firms tend to use outside
suppliers for model-based forecasts of new-products.
For mature products, the models that provide estimates of promotional effects,
such as SCAN*PRO (see Section 9.3) are also typically built by outside suppliers for
similar reasons. However, there is an additional argument for models based on scanner
data. The data collected from supermarkets, drugstores, mass merchandisers, etc. are
purchased and controlled by a few market research suppliers. Since the market-level
data used for tracking brand performance are less suitable for econometric modeling
(Christen, Gupta, Porter, Staelin, Wittink, 1997), the modeling should be done on
store or household-level data. However, these data are typically unavailable to manu-
facturers and retailers. Thus, the lack of suitable data makes it difficult or impossible
for most firms to consider building a model inside the firm. Of course, larger firms
should be able to obtain lower costs per application, in the form of quantity discounts.
Little's concept of a decision calculus, and his specification of criteria that a model
must satisfy for it to be labelled a decision-calculus model, have contributed to the
acceptance of models in marketing. In the past thirty years we have seen a high
amount of productivity by model builders who develop models that tend to satisfy
the implementation criteria. Still many problems remain, and model building in mar-
keting has its limitations. Some consultants and managers also criticize marketing
science for a lack of practical relevance. For example, Simon ( 1984, 1994) maintains
that:
"marketing science contributes virtually nothing to the important strategic issues
ofour era" (Simon, 1994, p. 41 ),
and
Simon ( 1994, p.36) also maintains that there is a lack of closeness to the real world in
the empirical studies. For example, he argues that the set of products in empirical
studies is not representative of economic reality. Published empirical studies em-
phasize marketing mix effects relevant to frequently bought consumer goods such
as coffee. In the face of this 'distortion' of reality, Simon (1994, p.36) talks about
"coffee marketing science".
Regularity
Despite being sensitive to unusual patterns of behavior, which may arise from a
catastrophe experienced by a brand, 14 we find the marketplace surprisingly well be-
haved, at least from our perspective as builders of market response models. We do
occasionally encounter irregularities but there are many possibilities to accommodate
them.
Catastrophe theory models can capture complex behavior with nonlinear equa-
tions. 15 The same holds for the intervention models introduced in Section 17.3. Re-
cently, contributions have been made to the modeling of structural breaks. 16 The
tests and models to detect and specify breaks also offer opportunities to account for
irregularities.
Other methods that can be used to estimate nonregular curves are:
Past Data
Most managers would agree that their marketing actions have consequences in peri-
ods
beyond the period in which they are taken. Consequently, time-series data, that is,
13. Our discussion is based on Parsons, Gijsbrechts, Leeflang, Wittink (1994) who commented on Simon's
(1994) 'Marketing Science's Pilgrimage to the Ivory Tower'. Another comment is Little, Lodish, Hauser, Urban
(1994).
14. See, for example, Section 17.3.
15. See Oliva, Oliver, MacMillan ( 1992).
16. See, for example, Bai, Perron ( 1998).
17. For an application in marketing, see Wedel, Leeflang (1998)
558 CHAPTER20
Judgmental data
If market data are not available to estimate a model, marketing scientists may use
managerial judgments. We discussed this issue in detail in Section 16.9. This ap-
proach may represent the best one can do given the lack of suitable marketplace
data. However, this assumes that managers understand systematic patterns captured
by marginal effects. Often, we prefer to use historical data to obtain a proper basis
for an initial understanding of how the market works. If alternative scenarios under
consideration involve substantial risks, and there is sufficient time, field experimen-
tation may provide further support. If there is insufficient time or the maintenance
of secrecy is paramount, the initial estimates can be updated with judgmental inputs.
It is also possible for managers to use procedures analogous to those used in pretest
market models such as ASSESSOR.
We conclude that:
Lack ofcloseness
Simon ( 1994) argues that the practical significance of marketing science is limited
and that too many problems of minor practical relevance are treated with sophisti-
cated methods in published papers. Also, only a small number of empirical studies
represent the industrial and service sectors. Below we provide counter arguments, and
we demonstrate the managerial relevance of marketing science. 19
Marketing recipients of the Franz Edelman Award for Management Science
Achievement convincingly demonstrate the managerial relevance and benefits of mar-
keting science in real-life applications. The Edelman competition is sponsored by
INFORMS and its College on the Practice of Management Science. With the award,
the organization recognizes achievement in the practice of management science in the
real world by focusing on work that has been implemented and has had significant
impact on the target organization. The first prize in 1988 was awarded for a series
of decision-calculus models developed for and implemented by Syntex Laboratories,
a pharmaceutical company best known for its birth control pills. The models helped
decide how large its sales force should be and how the members should be deployed
(Lodish, Curtiss, Ness and Simpson, 1988). In part the work is based on CALLPLAN
(Lodish, 1971), which was applied and refined over a 20-year period. The response
functions of the model were estimated from subjective judgments provided by a team
ofknowledgeablemanagers and sales people. 20 These response functions were shown
to give significantly better sales forecasts of each Syntex product for two years in the
future than were the existing forecasts. The model and its output helped persuade
Syntex management to greatly increase the size of its sales force and to redeploy it.
The result of this change was a documented eight percent annual sales increase of
25 million dollars, which more than offset the incremental cost from increasing (and
reorganizing) the sales force. The model was judged to have an important impact
on the strategic direction of the firm by helping management to focus on marketing
products with high potential.
A second-place winner in 1989 is a hybrid conjoint analysis and choice simula-
tion for the design of a new hotel chain for the Marriott Corporation (Wind, Green,
Shifflet, Scarbrough, 1989). The authors used a variety of other analyses, such as
multidimensional scaling and cluster analysis, as well. They employed econometric
techniques, such as dummy variable regression and ordinary and generalized least
squares regression, at various stages to estimate model parameters. The study pro-
vided specific guidelines to Marriot for selecting target market segments, positioning
services, and designing improved facilities in terms of physical layout and service
characteristics. The result was Courtyard by Marriott. This new chain became the
fastest growing moderately priced hotel chain in the United States with more than
100 hotels opened in about five years, and planned growth to 300 hotels in about ten
years. The actual market share of Courtyard by Marriott was within four percent of
the share predicted by the conjoint simulation.
The first-place winner in 1990 is a multiattribute choice model for determin-
ing product attributes desired by customers and for segmenting the market (Gensch,
Aversa, Moore, 1990). ABB Electric was formed about 20 years ago, with capital
from ASEA-AB Sweden and the RTE Corporation, to design and manufacture power
transformers for the North American market. Just when this start-up was approaching
the break-even point in its third year of existence, it was confronted with a 50 percent
drop in total industry sales of electrical equipment. Consequently ABB's survival
depended on taking customers away from established major competitors. In response,
the firm developed a marketing information system based on multiattribute choice
modeling that identified the current customers' perceptions of ABB's products versus
various competitors' products. ABB used information on what customers want most
from products to devise strategies for taking customers and market segments from
competitors. The multinomiallogit model's output enabled ABB to identify an op-
portunity to be the low-cost producer. The information also guided ABB in selecting
new products that customers preferred and for which it had long-term cost advantages.
ABB not only survived but it has grown to become the dominant firm in the industry
with a dollar market share of 40 percent. In a statement at a 1988 board of directors
meeting, ABB Electric's president said:
"Without the insights from our marketing models, it is unlikely we would have
current sales of$25 million; in fact, without the use ofthese models, it is unlikely
we would be here at all" (Gensch, Aversa and Moore, 1990, p. 18).
These perspectives were provided at a time when actual sales were more than $1 00
million.
Other practical examples of excellence in marketing science are described in the
case histories of the winners of the Advertising Effectiveness Awards administered by
the Institute of Practitioners in Advertising (IPA) (Broadbent, 1981, 1983, Channon,
1985, 1987, Feldwick, 1990, 1991). The objectives ofthese IPA Awards are:
Award categories include established consumer goods and services, new consumer
goods and services, and small budgets. Marketing science - in particular econometrics
- plays a key role in many IPA award-winning papers. While our previous examples
relied heavily on subjective judgments or primary data collection, many of the IPA
papers and the insights they provide are based on objective historical information on
sales and advertising efforts. Examples of brands in the more recent IPA competi-
tion demonstrating advertising effectiveness by application of econometrics include
COST-BENEFIT CONSIDERATIONS IN MODEL BUILDING AND USE 561
K.arvo (decongestant capsules), Knorr Stock Cubes (soup stock), and PG Tips (tea)
(Feldwick, 1991 ). Concerned that the awards may be too focused on short-run returns,
the IPA created a new "broader and longer effects" category to highlight the longer
term or indirect effects of advertising.
This book contains many other examples of models that demonstrate the managerial
relevance of marketing science.
• We discuss ASSESSOR in Section 10.2. Urban and Katz report in 1983 that
this pre-test-market evaluation model has been used to evaluate more than 200
products in more than 50 organizations.
• We discuss the SCAN* PRO model in various sections of this book. Commercial
applications of this model focus on differences in effects between regions, retail
outlets, and retail chains, and on asymmetries in own-brand effects and cross-
brand effects (Blattberg, Wisniewski, 1989). Brand managers can use information
about the effects of different promotional activities (such as display, feature ad-
vertisements, and temporary price cuts) to allocate trade promotion expenditures.
Managers can use the estimated effects as a partial basis to reconsider direct re-
tailer payments for display or feature activities or to modify off-invoice discounts
that may be passed through by the retailer in the form of temporary price cuts.
A conservative estimate is that SCAN*PRO has been used in 2,000 different
commercial applications. It can be applied separately, for example, by geographic
market (Nielsen distinguishes approximately 65 regions in the United States), by
type of retail outlet (food stores, drug stores, food/drug combination stores, mass
merchandise stores), and by retail chain. One extensive application involved data
on 10 products for 2,670 stores and 104 weeks, or almost 3 million data points.
SCAN*PRO applications tend to be issue driven. The client provides a descrip-
tion of well-defined managerial issues that can be addressed with an econometric
model estimated from available data. The service is available in North America
and in many European and Asian countries. SCAN*PRO results are reported to
have had a substantial influence on the behavior of brand managers and sales-
force personnel in terms of allocation of money, time, and effort to regions, outlets
and marketing programs.
• PROMOTIONSCAN is an implemented model and automated system for mea-
suring short-term incremental volume due to promotions by developing baselines
or store-level "normal" sales using store-level scanner data (Abraham and Lodish,
1992). About 2, 700 stores are in the data base. At least half of all major packaged
goods marketers in the US have used PROMOTIONSCAN information. Abraham
and Lodish discuss one case history, that of Conagra Corporation.
The SCAN*PRO and PROMOTIONSCAN models used by ACNielsen and IRI, and
variations of those models, have made it clear that, in general, the temporary price
cuts offered by retailers on selected items have very strong short-term sales effects.
Detailed analyses of those effects have also shown that most of the time, tlie pro-
motions are not profitable for the manufacturer. Thus, even if the potentially negative
562 CHAPTER20
long-term effects (Mela, Gupta, Lehmann, 1997) are not explicitly taken into account,
and even if the negative supply-chain effects are excluded, the passthrough of trade
promotions by retailers does not generate sufficient incremental sales, on average,
to make the promotion profitable. However, despite this finding, manufacturers are
reluctant to decrease promotional spending. The primary reasons are:
1. retailers threaten to reduce shelf space, as Procter & Gamble learned when the
firm decided to introduce an EDLP (Every Day Low Price) strategy, and
2. other manufacturers either increase or maintain promotional spending, as Procter
& Gamble also learned.
On the other hand, when manufacturers found that coupons for mature brands typi-
cally are also unprofitable, they encountered a much more sympathetic distribution
channel. Retailers find the coupons to be more of a nuisance than a profitable activity,
so the desire by Procter & Gamble and other manufacturers to reduce coupon activity
was met by very sympathetic retailers. Due to insights provided by scanner-based
models, several manufacturers have reduced the distribution of coupons for mature
brands by some fifty percent.
level which would not constitute a contribution to the literature. From an industry
perspective, Broadbent (1983, p.4) argues:
"though we would welcome improved methods, in practice it is normally better to
use tried and trusted methods."
Another reason for a lack of contribution to the published literature is that market-
ing managers have little time and usually have nothing to gain from writing about
projects for external consumption. Thus, it is not surprising that most of the appli-
cations described in scientific journals illustrate methodological innovations and are
written by academic scholars. Ideally, any such application should make a substantive
contribution as well. In any event new techniques are tested over time, and the best
become tried and trusted methods.
We conclude that:
• marketing science models are not restricted to frequently bought nondurable con-
sumer goods;
• marketing science has impact on management decisions;
• marketing science contributes to the resolution of strategic planning issues, com-
petitive strategy questions and public policy problems.
CHAPTER 21
565
566 CHAPTER21
The store-level data show similar performances and activities aggregated across the
households visiting a given store. Here the aggregation is far less harmful for model
building, because households visiting a given store are exposed to the same marketing
activities within a given week. However, the typical store-level model does not ac-
commodate heterogeneity in household preferences and in sensitivities to marketing
instruments. Some current academic research includes attempts to not only accommo-
date but also to recover household heterogeneity from store-level data (e.g. Bodapati
and Gupta, 1999). To the extent that household data are "representative"(see e.g.
Leeflang, Olivier, 1985, Gupta, Chintagunta, Kaul, Wittink, 1996), these disaggregate
data provide the best opportunities for managers to obtain a complete understanding
of marketplace complexities in stores equipped with scanners. Of particular interest
to managers are models that explain purchase incidence, brand choice and quantity
decisions at the household level. For managers these models provide substantively
meaningful results about, for example, the proportion of the increase in purchases of
a given item due to promotions of the item that is attributable to brand switching, ac-
celeration in purchase timing and increases in purchase quantities (e.g. Gupta, 1988,
Bell, Chiang, Padmanabhan, 1997, 1999).
We believe that both household- and store-level data can provide meaningful
insights about marketing phenomena. An important advantage associated with house-
hold data is that household heterogeneity can be fully exploited. On the other hand,
for relatively infrequently purchased goods, household data are often insufficient due
to sparseness. In addition, while the representativeness of household data appears to
be acceptable for selected small cities in which a cooperating household uses the same
plastic card in all supermarkets (Gupta, Chintagunta, Kaul, Wittink, 1996), there is
considerable doubt about the representativeness in metropolitan areas. In those areas
a system of plastic cards cannot cover all the outlets the households frequent, while a
system of personal wands may or may not be appealing to households. 1
These conflicting considerations suggests that managers will benefit most from
models that combine household-and store data. Promising examples of joint usage
of multiple data sources include Russell and Kamakura (1994). We discussed their
model briefly in Section 14.1.2.
During the 1980's and 1990's, largely due to the adoption of scanner equipment by
supermarkets, much of the academic research was focused on the effects of promo-
tional activities. Guadagni and Little (1983)'s paper on the analysis of household
purchase data is a classic. Little also advanced the use of models for the deter-
mination of incremental purchases and profitability resulting from (manufacturer)
coupons, while Lodish has examined the profitability of trade promotions. The evi-
dence appears to indicate quite strongly that most promotions for mature products
are unprofitable. Mela and his associates have provided the initial results with re-
gard to the long-term impact ofpromotions. 2 Their results suggest that households'
price sensitivities increase and brand equities erode, with increases in promotional
expenditures.
These empirical findings should influence managers to rethink the allocation of
marketing expenditures to various instruments. Indeed, we predict that the percent
allocated to promotions will be reduced substantially, perhaps to the levels prevailing
in the 1970's. Of course, there are other developments that also influence the budget
allocation, as we discuss in Section 21.3.
The availability of scanner data has a tremendous effect on the opportunities for
model specification. In the first three eras of model building in marketing (Section
1.1) the emphasis is on models for a single brand specified at the brand level. In
the fourth and fifth era we see models specified at the SKU-level, covering multiple
own- and other-brand items, where competition is defined at the product category
level and sometimes covers multiple product categories. In the eighties we have
seen an important shift in attention from just the demand to the demand function
and the competitive reaction (Chapter 11). We now see an increasing attention for
empirical game-theoretic models with an emphasis on horizontal competition (Sec-
tion 11.4). Also there are studies on cooperation and vertical competition in the
channel. 3 Examples of the latter are studies on the competition between market-
ing and retailing (Balasubramanian, 1998) guaranteed profit power (Krishnan, Soni,
1997) manufacturers' returns policies (Padmanabhan, Png, 1997), and manufacturer
allowances and retailer-pass-through rates (Kim, Stae1in, 1999). Many of these stud-
ies do not include (yet), empirical validations. Other studies consider satisfaction
in marketing channel relationships (Geyskens, Steenkamp, Kumar, 1999) and in-
terdependencies/relationships between channel partners (Kumar, Scheer, Steenkamp,
1995a, 1995b).
The phenomenon of shifts in power from manufacturers to retailers in many
channels is the focus of much discussion and debate amongst marketing academicians
and practitioners. This shift in power is also reflected in the construction of models
specified at the retail chain- and individual store level where we also see models for
"micro marketing" (Montgomery, 1997). We return to these developments in Section
21.3. We also expect other shifts in attention such as:
• from models for products toward the development of models for services;
• from models covering "metropolitan areas", regions and countries to models for
international marketing activities;
• from tactical decisions to strategic decisions.
Models of scanner data have provided useful insights about, say, the average ef-
fectiveness of various marketing programs. Increasingly sophisticated models and
estimation methods also allow managers to adjust the details of individual activities.
For example, researchers have used nonparametric estimation methods to allow the
jUnctional form of the relation between variables to be completely determined by
the data. The curves so obtained often show dramatically different nonlinear effects
than those implied by parametric estimation of models with transformed variables.
Van Heerde, Leeflang and Wittink (1999a) find that the deal effect curve tends to be
S-shaped. For many of the items analyzed, they find that temporary price discounts
of less than 10 percent have little impact on sales (threshold effects). Above, say,
10 percent the impact increases rapidly and it levels off after perhaps 25 percent
(saturation effects). While the data show that the percent discounts occur over this
entire range, these results suggest that discounts of less than 10 percent and more
than 25 percent are often not efficient.
Household data are uniquely suitable for the identification of distinctions between
households in terms of, for example, brand preferences, loyalty and switching be-
havior, incidence and brand choice sensitivities to marketing activities, etc. Indeed,
it is clear that the 21st century is the century in which, largely through technolog-
ical means, households will be increasingly exposed to individualized marketing
programs. We discuss the role of models in this new environment in Section 21.3.
We discuss in Sections 15.2 and 16.9 the unique advantages ofhumanjudgment ver-
sus the advantages of models in (management) decisions. If all managers were fam-
iliar with these arguments and accepted the implications, we would see a much greater
demand for model-building services. Managers are skeptical of models because:
1. they (may) believe that the use of models imposes constraints on their decision-
making authority;
2. they believe that marketing is an art and that models provide inappropriately
precise results; and/or
3. they have used a model once which promised far more than it could deliver.
Other arguments are discussed in Chapter 19.
MODELSFORTHEFUTURE 569
Model acceptance will be facilitated if model builders can present convincing ar-
guments why certain marketing decisions will improve with a model. In addition,
having access to applications that demonstrate how other managers obtained substan-
tial benefits will be critical to potential users. Even if the model builder is successful
on these two aspects, it is important to show that the essence of certain, repetitive
decisions can be captured effectively. Examples of marketing decisions appropriate
for automation are assortment decisions, customized product offerings, coupon tar-
geting, budget allocation, inventory decisions, etc. 4 . If the manager understands that
statistical analyses of historical data can provide more valid insights than is possible
from inspection of the data, then an additional argument is that the manager will have
more time for creative and other tasks for which models are not suitable.
Resistance to model acceptance is likely to occur if the manager does not have
at least a strong conceptual understanding of the model characteristics. Thus, the
manager should know what a model does and does not cover with respect to the
complex relations in the real world. Much of this understanding can be obtained
from "what if" questions. That is, the manager should have the opportunity to use
a computer, submit marketing program details, and find out what the model predicts
will be the market behavior. The manager should find that all conditional predictions
are reasonable. The model builder should be able to explain the logic of all results.
Once the model is completed, resistance to acceptance can be overcome if the
manager is willing to play against the model. Each time a decision is made for which
the outcome can be predicted by the model, the manager can record his/her forecast,
and subsequently the accuracy of the manager's prediction can be compared against
the model's. Obviously if this process is repeated, reliable comparisons can be made.
Once a model is in use, it is important to continue to check the accuracy of con-
ditional predictions. These accuracies can be compared with what would be expected
based on (initial) model estimation and testing. In addition, the forecast accuracies
can be tracked over time against various conditions, etc. This tracking provides the
model builder with an opportunity to identify the weakest aspects in the model.
In this sense, continuous tracking provides the model builder with a basis for
deciding whether, for example, the model misses important components and needs to
be respecified or whether, say, the parameters need to be updated periodically. In the
former case the magnitudes of prediction errors may vary likely while in the latter
case the magnitudes are likely to increase over time.
Assuming that confidentially issues are manageable, we believe it is useful for some
model details to be communicated throughout the organization. Doing this will en-
courage others to consider opportunities for model building. In addition, it should
provoke a broader discussion about the relevance of model components, about the
nature of relations, and about the benefits to be obtained from marketing decisions.
Importantly, when actual results tum out to be very different from the predictions,
there will be a broader group of individuals who can provide "reasons why". And of
course if the marketing decisions are based on models that ignore how other parts of
the organization are affected (such as the negative impact of promotional programs on
production and logistics), the managers of those parts will be alerted. They can then
argue for the inclusion of additional elements in a more inclusive model. We discuss
the tendency to develop more "integrated", more "global", models in Chapter 19.5
There are major developments taking place in business as the end of the millen-
nium approaches. For marketing the most critical one is the focus on (individual)
customers. Computer hardware and software make it feasible for firms to examine
the purchase behavior of individual households. Strategic consulting companies have
reclaimed the 80/20 rule. For example, for many firms it appears that 20 percent of
the customers account for 80 percent of the profits. The new element in the strategic
frameworks used by consultants and managers is the customer (consumer).
Among recent empirical findings are that repeat purchase behavior is nonlinearly
related to satisfaction. For example, when Xerox used a 5-point scale to measure
customer satisfaction, respondents who gave Xerox a 5 (very satisfied) were six times
more likely to repurchase than respondents who scored a 4 (satisfied). Separately,
research results across various firms indicate that it is a lot cheaper to serve existing
customers than it is to attract and serve new customers. In addition, loyal (exist-
ing) customers typically are less price sensitive than non-loyal customers. Although
these results are not sufficient for management to decide how to distinguish between,
say, loyals and switchers, this question is of great interest to both economists and
marketing researchers.
Traditionally marketing has been about offering products and services commu-
nicated, distributed and priced in a manner that fits a target market's preferences
and that allows a firm to maximize profits. However the shift in focus from products
(and brand managers) to customers (and customer managers) implies that firms will
now focus explicitly on the profitability of individual customers. Thus, a firm will
select those customers for whom it can offer products and services better than other
firms can, with whom it can develop long-term relations in such a way that each
customer is expected to contribute to the firm's profits. These ideas are part of "new
marketing frameworks". For example, Hoekstra, Leeflang and Wittink (1999) argue
that (marketing) strategy should pursue the realization of superior customer values,
and that business objectives should be stated in terms of the customer (e.g. customer
satisfaction, customer retention). They call this "the customer concept".
We show a simple framework that reflects the critical role the customer plays in
Figure 21.1. In this framework we assume the customer is a household, but it is
straightforward to substitute any other consumption unit. We propose that a firm
5. See for an example, Hausman and Montgomery (1997). Their study refers to issues in "market-driven
manufacturing".
MODELS FOR THE FUTURE 571
Household
-+---- based on convenience, reliability, low cost,
preferences
high quality, etc.
for values
+
Household -+---- as a function of benefits offered, marketing
purchases mix programs, etc. for a brand
l
Customer with respect to the purchase experience and
satisfaction the consumption: results in brand equity
+
Customer
-+---- repeat purchase and word of mouth
loyalty
+
Market the ultimate criterion is the valuation
capitalization -+---- of the firm by investors
identifies meaningful customer values that it can satisfy better than other firms can
(re: core competencies). The closer it can come to offering exactly what each indi-
vidual household prefers (including where it can be obtained, how the benefits are
perceived and what the price is), the more likely the household will be satisfied with
the purchase and the consumption or use. High degrees of satisfaction will lead to
customer loyalty, and as long as the financial considerations are appropriately consid-
ered, each customer is profitable (in an expected value sense). Of course, under certain
conditions, some customers may contribute positively to profits only in the influence
they have on other customers. Thus, in the same way that managers now focus on
the profitability of goals at the category- instead of the brand- level, in the future
the focus may shift from treating each customer independently to explicitly taking
dependencies between customers into account. This requires a continuous, customer-
oriented feedback system. 6 The monitoring system will concentrate on marketplace
purchase behavior integrated with survey data. Customer managers should know not
only what customers are purchasing (and why), but also how satisfied they are with
the purchases made, with the interactions that occurred, and what unmet needs re-
main. In addition, customers should be categorized according to expected lifetime
value. Ideally, the contact between the firm and the customer occurs continuously, in
both directions, but in a manner that suits both parties.
By integrating these different data sources7 managers will be in a much better position
to consider new-product opportunities. Importantly, with the appropriate sets of mod-
els, managers can contemplate changes in the marketing mix at the individual house-
hold level that can be expected to enhance profitability and market capitalization.
Thus, the challenge for model builders is to develop ways in which the appropriate
relations can be developed at the individual level including profit implications.
Adopting a pivotal role to the customers also has consequences for the composition
of the foture marketing mix. Marketing decisions will have to reflect, more then is
currently the case, the interests of customers. 8 Consider how the joint efforts of manu-
facturers and grocery retailers have influenced how consumers are guided through the
stores. Many aspects are arranged to entice the consumer to expend sums of money to
the retailers' advantage. While it is possible for consumers to develop and maintain
idiosyncratic choice rules, the manufacturer creates products, packages, package sizes
and variations that can overwhelm the consumers' decision-making process. The
retailer compounds the consumer's problem by, for example, giving preference in
shelf-space management to items with favorable retailer margins. And both manufac-
turer and retailer offer a range of different kinds of promotions. For most consumers,
the result is a myriad of aspects that together make it very difficult for each consumer
to make optimal purchases, given consumer heterogeneity in utility functions.
Electronic home shopping should allow consumers to specify criteria (e.g. lowest
unit price, selected ingredients) on which each consumer wants the selections to be
made. Such an innovation will allow consumers to maximize their utility functions
much more closely than ever before. Customized marketing programs can be created
to help customers achieve utility maximization. For both manufacturers and retailers
it will become critical to incorporate this new reality into future marketing decisions.
We note that new communication media (e.g. the Internet) accelerate the diffusion
of insights gained by individual consumers about the potential to realize gains in
utility. Alternatively, new services will be created to help consumers maximize their
objective functions. We now briefly discuss the consequences of adopting a more
central role of the customer for the set of marketing instruments.
We imagine that customer-research-based new product or service development
will be facilitated by the proposed information system. Management will have si-
multaneous access to consumer marketplace choice data and survey data. Diagnostic
information will be available for ''what if" questions, and the customer monitoring
system includes actions by new competitors reflected in customer purchases from the
very beginning. To accomplish this, supplemental information, for example through
distribution outlets not commonly included in data collection, will have to be gath-
ered.
New-product or service development will be a continuous process. Customers
either volunteer new product ideas or respond to requests for information based on
changes in customers' marketplace choices. Continuous customer-based new-product
development is a natural consequence. A high-variety product line enables the cus-
tomer to select the customized option he or she desires (Kahn, 1998). Under the
customer concept decision making is pro-active not reactive.
Production may be on a just-in-time basis, and may not take place without an
order. The customer defines quality. Customers buy benefits, not products. Therefore,
services are added to the core product. Webster (1994) states that customer expecta-
tion revolve around the service aspects of the product offering, and that information
can tum any product into a service and, thereby, build a customer relationship.
Full exploitation of consumer heterogeneity also allows for further distinction in
prices, based on willingness to pay and based on differences in the service-enhanced
product.
Just-in-time advertising and promotion define the promotion mix under the new
marketing paradigm. Promotion and commercial messages will be tailored to specific
needs. Messages will be "narrow-casted", and accessed by customers at the time of
need.
Classical media may be used to invite interaction, but the focus will be on cus-
tomer-initiated selection of information. There are always possibilities for the cus-
tomer to contact the organization. Every message contains a telephone number, e-mail
address, etc. to facilitate and stimulate two-way communication. Different messages
and different media will be used for different customers and prospects. Communi-
cation emphasizes the maintenance of relations, while promotions are used to attract
new customers.
The customer-based information system will also allow management to identify
new distribution opportunities more easily. For example, under some conditions it
may be desirable for consumer-goods manufacturers of frequently purchased branded
items to engage in direct selling to households. Other consequences of the adoption
of the customer concept are the use of direct and indirect distribution channels. Two-
way communication and distribution is possible through media such as fax and the
Internet. This offers opportunities for on-line product selection, virtual shopping and
personal delivery. In the near future the simultaneous consideration of manufacturer
and retailer perspectives will get more attention. Due to the increase in retailer power,
it will be critical for manufacturers to offer marketing activities that have demon-
strable benefits for cooperating retailers. Although the current projections for the use
of the Internet for grocery purchases are quite modest, it is conceivable that a new
venture such as Webvan will quickly achieve a similar amount of power currently
attributed only to Walmart and will dramatically increase the use of the Internet for
consumer goods.
Webvan will offer households the opportunity to do grocery shopping on-line.
However, unlike Peapod which obtains goods from existing supermarkets, Webvan
builds its own warehouse with more than 4 miles of conveyer belts. In this manner
it avoids many of the traditional supermarket costs. If households place their orders
a few days in advance (with appropriate incentives, this should be straightforward),
Webvan could aggregate orders across households for each day, and have all suppliers
provide the required amounts on an as-needed basis. Suppliers can provide goods
either directly or via intermediaries if the daily supply of a given supplier is modest.
Note that each warehouse is projected to cover the trading area of 20 supermarkets.
Similar to the hub-and-spoke system used by Federal Express for express packages,
Webvan can sort the goods received by households (as by destination for Federal
Express}, and prepare packages for home delivery by vans. In this manner, Webvan
can eliminate virtually all spoilage of perishables and all inventories (projected to
be about 3 percent of total revenues}, eliminate many chores in the traditional retail
576 CHAPTER21
outlets including (re)supplying store space and checkout, and deliver customized bas-
kets of goods to subscribing households at no extra charge (for purchases above, say,
$50) and at lower prices than any traditional supermarket can offer. The modeling of
marketing-related phenomena can be done at the individual household level. Sugges-
tions about potentially relevant items, based on the principles used by Amazon for
suggestions to its customers about books and other items, can be based on a combi-
nation of purchase and survey data. Importantly, loyal customers can be rewarded in
a new manner. Again, based on the purchase records, and survey responses, Webvan
can add customized suprises to the customized basket of goods. The frequency and
nature of these suprises would depend on the profitability of each household, and
perhaps on the availability of new products, and the cooperation of manufacturers
who want to offer trial amounts of new products through the retailer.
Such new ventures that come about through the Internet require that existing
retailers as well as manufacturers adapt. To understand the impact of these new
developments, existing firms have to understand how households will approach the
new services. For example, existing retailers will have to interview their customers to
understand how they should modify their operations to compete with the new opera-
tors. Preference-based models will allow the firms to determine the attractiveness of
all strategic options available to them. However, the Internet-based operators have the
advantage that they have continuous interaction with their customers which facilitates
their collection and analysis of preferences for things not yet offered, satisfaction with
the purchase and consumption of goods, and the relevance of marketing activities,
including new products, on the loyalty, retention and profitability of individual cus-
tomers. It appears therefore, that the Internet not only allows new firms to operate
based on a much more complete understanding of individual customers, but their
success will force the existing firms that have not yet adopted such an approach to
follow the lead if they want to survive.
The modeling of marketing effects, such as promotions, price, advertising and distri-
bution, has had strong impact on managerial decisions at the major consumer-goods
manufacturers. Both ACNielsen and IRI report successful implementation of model
results at firms in North America, Europe and Asia. The models appear to provide
good insights into the short-term effects at the item or brand level. However, much
more is needed to show long-term effects such as the possible reduction in brand
equity due to promotional activities. On the other hand, managers appear to be quite
confident that they have a good understanding of the long-term advertising effects
based on the rule that the total effect is roughly twice the short-term effect.
Another aspect that requires more attention is the effect of marketing activities on
profits at the level of the firm. It is now common for marketing managers to be asked
for the profit implications of investments in marketing. To do this meaningfully, both
short- and long-term profit implications should be considered. In addition, the use
MODELSFORTHEFUTURE 577
of, for example, promotions, has to be assessed not just in terms of the effects on
sales and the attribution to different sources of sales gains, but also by considering
the negation of benefits due to competitive reactions. Also, the effects on production,
distribution and inventory management (supply chain) need to be included. Future
model-building efforts will include a much more comprehensive examination of all
these components.
We propose that future models provide managers with ''what if'' simulation capa-
bilities so that both short-and long-run effects can be documented, likely competitive
reactions can be taken into account, and profit implications for alternative marketing
actions can be considered.
The developments we discussed in this chapter determine the demand for "new"
models which can be used as an aid for decision-making in marketing in the future.
Bibliography
Aaker, D.A. (1995), Strategic Market Management, 4th ed., John Wiley & Sons, New York.
Abad, P.A. ( 1987), 'A Hierarchical Optimal Control Model for Co-ordination of Functional Decisions in a
Firm', European Journal of Operations Research, vol. 32, pp. 62-75.
Abe, M. (1995), 'A Nonparametric Density Estimation Method for Brand Choice Using Scanner Data',
Marketing Science, vol. 14, pp. 300--325.
Abelson, R.P. (1976), 'Script Processing in Attitude Formation and Decision Making' in Carra!, J.D. and J.
Payne (eds.), Cognition and Social Behavior, Lawrence Erlbaum, Hillsdale, NJ., pp. 33-45.
Abmham, M.M. and L.M. Lodish (1989), 'Fact-Based Stmtegies for Managing Advertising and Promotion
Dollars: Lessons from Single Source Data', Working Paper# 89-{)06, Marketing Department, The
Wharton School of the University of Pennsylvania.
Abmham, M.M. and L.M. Lodish (1992); 'An Implemented System for Improving Promotion Productivity
Using Store Scanner Data', Marketing Science, vol. 12, pp. 248-269.
Agarwal, M.K. and V.R. Rao (1996), 'An Empirical Comparison of Consumer-Based Measures of Brand
Equity', Marketing Letters, vol. 7, pp. 237-247.
Aigoer, D.J. and S.M. Goldfeld (1973), 'Simulation and Aggregation: A Reconsidemtion', The Review of
Economics and Statistics, vol. 44, pp. 114-118.
Aigoer, D.J. and S.M. Goldfeld (1974), 'Estimation and Prediction from Aggregate Data when Aggregates are
Measured more Accumtely than their Components', Econometrica, vol. 42, pp. 113-134.
Ailawadi, K.L., P.W. Farris and M.E. Parry {1999), 'Market Share and ROI: Observing the Effect of
Unobserved Variables', International Journal ofResearch in Marketing, vol. 16, pp. 17-33.
Ailawadi, K.L. and S.A. Neslin ( 1998), 'The Effect of Promotion on Consumption: Buying More and
Consuming It Faster', Journal qf Marketing Research, vol. 35, pp. 390-398.
Ainslie, A. and P.E. Rossi {1998), 'Similarities in Choice Behavior Across Product Categories', Marketing
Science, vol. 17, pp. 91-106.
Aitken, A.C. (1935), 'On Least Squares and Linear Combination of Observations', Proceedings qfthe Royal
Society qfEdinburgh, vol. 55, pp. 42-48.
Ajzen, I. and M. Fishbein (1980), Understanding Attitudes and Predicting Social Behavior, Prentice-Hall,
Englewood Cliffs, NJ.
Akaike, H. {1974), 'A New Look at Statistical Model Identification', IEEE Transactions on Automatic Control,
vol.l9,pp. 716-723.
Akaike, H. {1981), 'Likelihood of a Model and Information Criteria', Journal ofEconometrics, vol. 16, pp.
3-14.
Albach, H. (1979), 'Market Organization and Pricing Behavior ofOligopolistic Firms in the Ethical Drugs
Industry', Kyklos, vol. 32, pp. 523-540.
Albers, S. ( 1998), 'A Fmmework for Analysis of Sources of Profit Contribution Variance between Actual and
Plan', International Journal ofResearch in Marketing, vol. 15, pp. \09-122.
Allenby, G.M. (1990), 'Cross-Validation, The Bayes Theorem, and Small Sample Bias', Journal of Business
and Economic Statistics, vol. 8, pp. 171-178.
Allenby, G.M., N. Arom and L. Ginter (1998), 'On the Heterogeneity of Demand' ,Journal ofMarketing
Research, vol. 35, pp. 384-389.
Allenby, G.M. and L. Ginter {1995), 'Using Extremes to Desigo Products and Segment Markets', Journal of
Marketing Research, vol. 32, pp. 392-403.
Allenby, G.M. and P.E. Rossi {199la), 'There is No Aggregation Bias: Why Macro Logit Models Work',
Journal ofBusiness and Economic Statistics, vol. 9, pp. 1-14.
Allenby, G.M. and P.E. Rossi (199lb), 'Quality Perceptions and Asynunetric Switching Between Brands',
Marketing Science, vol. 10, pp. 185-204.
579
580 BIBLIOGRAPHY
Allenby, G.M. and P.E. Rossi (1999), 'Marketing Models ofConswner Heterogeneity', Journal of
Econometrics, vol89, pp. 57-78.
Almon, S. (1965), 'The Distnbuted Lag Between Capital Appropriations and Expenditures', Econometrica,
vol. 33, pp. 178-196.
Alsem, K.J. and P.S.H. Leeflang (1994), 'Predicting Advertising Expenditures Using Intention Surveys',
International Journal of Forecasting, vol. 10, pp. 327-337.
Alsem, K.J., P.S.H. Leeflang and J.C. Reuyl (1989), 'The Forecasting Accuracy of Market Share Models Using
Predicted Values of Competitive Marketing Behaviour', International Journal ofResearch in Marketing,
vol. 6, pp. 183-198.
Alsem, K.J., P.S.H. Leeflang and J.C. Reuyl (1990), 'Diagnosing Competition in an Industrial Market',
EMAC/ESOMAR Symposium on 'New Ways in Marketing and Marketing Research', pp. 161-178.
Alsem, K.J., P.S.H. Leeflang and J.C. Reuyl (1991), 'The Expansion of Broadcast Media in the Netherlands:
Effects on the Advertising Expenditures', The Expansion ofBroadcast Media: Does Research Meet the
Challenges?, ESOMAR, Madrid, pp. 65-79.
Amemiya, T. (1983), 'Nonlinear Regression Models' in Griliches, Z. and M.D. lntriligator (eds.), Handbook of
Economics, North-Holland, Amsterdam, pp. 333-390.
Amemiya, T. (1985), Advanced Econometrics, Harvard University Press, Cambridge, Mass.
Amemiya, T. ( 1994), Advanced Econometrics, 6th pr., Harvard University Press, Cambridge, Mass.
Amstutz, A.E. (1967), Computer Simulation of Competitive Market Response, M.l.T. Press, Cambridge, Mass.
Amstutz, A. E. ( 1969), 'Development, Validation and Implementation of Computerized Micro Analytic
Simulations of Market Behavior' in Aronofski, J.S. (ed.), Progress in Operations Research, John Wiley &
Sons, New York, vol. 3, pp. 241-262.
Amstutz, A.E. ( 1970), 'Management, Computers and Market Simulation' in Montgomery D.B. and G.L. Urban
(eds.), Applications of Management Science in Marketing, Prentice-Hall, Englewood Cliffs, NJ.
Andre, L. (1971), 'Short-Term Optimization of the Sales Force: The Case of an Industrial Product', CESAM
Working Paper No. 16--0771, University of Louvain.
Armstrong, J.S. (1985), Long-range Forecasting,from Cystral Ball to Computer, 2nd ed., John Wiley & Sons,
New York.
Armstrong, J.S., R.J. Brodie and S.H. Mcintyre (1987), 'Forecasting Methods for Marketing, Review of
Empirical Research', International Journal a,( Forecasting, vol. 3, pp. 355-376.
Armstrong, J.S. and F. Collopy (1996), 'Competitor Orientation: Effects of Objectives and Information on
Managerial Decisions and Profitability', Journal ofMarketing Research, vol. 33, pp. 188-199.
Armstrong, J.S. and A.C. Shapiro (1974), 'Analyzing Quantitative Models',Journal ofMarketing, vol. 30,
April, pp. 61-65.
Ashton, A. H. and R.H. Ashton ( 1985), 'Aggregating Subjective Forecasts: Some Empirical Results',
Management Science, vol. 31, pp. 1499-1509.
Assael, H. (1967), 'Comparison of Brand Share Data by Three Reporting Systems', Journal of Marketing
Research, vol. 4 , pp. 400-40 I.
Assmus, G., J.U. Farley and D.R. Lehmann (1984), 'How Advertising Affects Sales: Meta Analysis of
Econometric Results',Journal ofMarketing Research, vol. 7, pp. 153-158.
Assunviio, J. and R.J. Meyer (1993), 'The Rational Effect of Price Uncertainty on Sales-Price Relationships',
Management Science, vol. 39, pp. 517-535.
Bagozzi, R.P. ( 1977), 'Structural Equation Models in Experimental Research', Journal ofMarketing Research,
vol. 14, pp. 209-226.
Bagozzi, R.P. (1980), Causal Models in Marketing, John Wiley & Sons, New York.
Bagozzi, R.P. (ed. ), ( 1994a), Advanced Methods ofMarketing Research, Blackwell Publishers, Cambridge,
Mass.
Bagozzi, R.P. (1994b), 'Structural Equation Models in Marketing Research: Basic Principles' in Bagozzi, R.P.
(ed. ), Principles ofMarketing Research, Blackwell Publishers, Cambridge, Mass.
Bagozzi, R.P., H. Baumgartner and Y. Yi ( 1992), 'State versus Action Orientation and the Theory of Reasoned
Action: An Application to Coupon Usage', Journal of Consumer Research, vol. 18, pp. 505-518.
Bagozzi, R.P. and A.J. Silk (1983), 'Recall, Recognition, and the Measurement of Memory for Print
Advertisements', Marketing Science, vol. 2, pp. 95-134.
Bai, J. and P. Perron (1998), 'Estimating and Testing Linear Models with Multiple Structural Changes',
Econometrica, vol. 66, pp. 47-78.
BIBLIOGRAPHY 581
Balasubramanian, S. (1998), 'Mail versus Mall: A Strategic Analysis of Competition between Direct Marketers
and Conventional Retailers', Marketing Science, vol. 17, pp. 181-195.
Balasubramanian, S.K. and A.K. Ghosh ( 1992), 'Classifying Early Product Life Cycle Fonns via a Diffusion
Model: Problems and Prospects', International Journal ofResearch in Marketing, vol. 9, pp. 345-352.
Balasubramanian, S.K. and D.C. Jain ( 1994), 'Simple Approaches to Evaluate Competing Non-Nested Models
in Marketing', International Journal ofResearch in Marketing, vol. 11, pp. 53-72.
Balderston, F.E. and A.C. Hogatt (1962), Simulation ofMarket Processes, Institute of Business and Economic
Research, University of California, Berkeley, California.
Baligh, H.H. and L.E. Richartz (1967), 'Variable-Sum Game Models of Marketing Problems',Journal of
Marketing Research, vol. 4, pp. 173-183.
Baltagi, B. H. ( 1998), Econometrics, Springer Berlin.
Balzer, W.K., L.M. Sulsky, L.B. Hammer and K.E. Sumner (1992), 'Task Infonnation, Cognitive Infonnation
or Functional Validity Infonnation: Which Components of Cognitive Feedback Affects Perfonnance?',
Organizational Behavior and Human Decision Processes, vol. 53, pp. 35-54.
Banslaben, J. (1992), 'Predictive Modeling' in Nash, E.L. (ed.), The Direct Marketing Handbook,
McGraw-Hill, New York.
Barnett, A.l. (1976), 'More on a Market Share Theorem', Journal ofMarketing Research, vol. 13, pp. 104-109.
Barr, S.H. and R. Sharda ( 1997), 'Effectiveness of Decision Support Systems: Development or Reliance
Effect?', Decision Support Systems, vol. 21, pp. 133-146.
Barteo, A.P. ( 1977), 'The Systems of Consumer Demand Functions Approach: A Review', Econometrica, vol.
45, pp. 23-51.
Bass, F.M. (1969a), 'A Simultaneous Equation Regression Study of Advertising and Sales of Cigarettes',
Journal qfMarketing Research, vol. 6, pp. 291-300.
Bass, F.M. (1969b), 'A New Product Growth Model for Consumer Durables', Management Science, vol. 15,
pp. 215-227.
Bass, F.M. ( 1971 ), 'Testing vs. Estimation in Simultaneous Equation Regression Models', Journal of
Marketing Research, vol. 8, pp. 388-389.
Bass, F.M. (1986), 'The Adaption of a Marketing Model: Comments and Observations' in Mahajan, V. andY.
Wind (eds.), Innovation DiffUsion ofNew Product Acceptance, Ballinger, Cambridge, MA, pp. 27-33.
Bass, F.M. (1995), 'Empirical Generalizations and Marketing Science: A Personal View', Marketing Science,
vol. 14, pp. 06-<119.
Bass, F.M. and D.O. Clarke ( 1972), 'Testing Distributed Lag Models of Advertising Effect', Journal of
Marketing Research, vol. 9, pp. 298-308.
Bass, F.M., T.V. Krishnan and D.C. Jain (1994), 'Why the Bass Model Fits without Decision Variables',
Marketing Science, vol. 13, pp. 203-223.
Bass, F.M. and R.P. Leone ( 1986), 'Estimating Micro Relationships from Macro Data: A Comparative Study of
Two Approximations of the Brand Loyal Model Under Temporal Aggregation', Journal of Marketing
Research, vol. 23, pp. 291-297.
Bass, F.M. and L.J. Parsons ( 1969), 'A Simultaneous Equation Regression Analysis of Sales and Advertising',
Applied Economics, vol. I, pp. 103-124.
Bass, F.M. and T.L. Pilon (1980), 'A Stochastic Brand Choice Framework for Econometric Modeling of Time
Series Market Share Behavior', Journal ofMarketing Research, vol. 17, pp. 486-497.
Bass, F.M. and Y. Wind ( 1995), 'Introduction to the Special Issue: Empirical Generalizations in Marketing',
Marketing Science, vol. 14, pp. GI-GS.
Bass, F.M. and D.R. Wittink (1975), 'Pooling Issues and Methods in Regression Analysis with Examples in
Marketing Research', Journal of Marketing Research, vol. 12, pp. 51-58.
Basu, A.K., R. La1, V. Srinivasan and R. Staelin ( 1985), 'Salesforce Compensation Plans: An Agency
Theoretic Perspective', Marketing Science, vol. 4, pp. 267-291.
Basuroy, S. and D. Nguyen (1998), 'Multinomial Logit Market Share Models: Equilibrium Characteristics and
Strategic Implications', Management Science, vol. 44, pp. 1396-1408.
Batra, R., J.G. Myers and D.A. Aaker (1996), 'Advertising Management, 5th ed., Prentice-Hall, Upper Saddle
River, NJ.
Batsell, R.R. and J.C. Polking (1985), 'A New Class of Market Share Models',Marketing Science, vol. 4l,pp.
177-198.
Baumol, W.J. and T. Fabian (1964), 'Decomposition, Pricing for Decentralization and External Economics',
Management Science, vol. II, pp. 1-32.
Bayer, J., S. Lawrence andJ.W. Keon (1988), 'PEP: An Expert System for Promotion Marketing' in Turban, E.
582 BIBLIOGRAPHY
and P.R. Watkins (eds.), Applied Expert Systems, Elsevier Science Publishers, North-Holland,
Amsterdam, pp. 121-141.
Bayus, B.L., S. Hong and R.P. Labe (1989), 'Developing and Using Forecasting Models of Consumer
Durab1es, the Case of Color Television', Journal ofProduct Innovation Management, vol. 6, pp. 5-19.
Bayus, B.L. and R. Mehta ( 1995), 'A Segmentation Model for the Targeted Marketing of Consumer Durables',
Journal ofMarketing Research, vol. 32, pp. 463-469.
Beach, W.O. and V. Barnes (1987), 'Assessing Human Judgment: Has It Been Done, Can It Be Done, Should It
Be Done?' in Wright, G. and P. Ayton (eds.), Judgmental Forecasting, pp. 49-62. John Wiley & Sons,
New York.
Bearden, W.O., R.G. Netemeyer and M.F. Mobley (1993), Handbook ofMarketing Scales, Sage, London.
Bearden, W.O. and J .E. Tee! ( 1983), 'Selected Detenninants of Consumer Satisfaction and Complaint
Reports', Journal ofMarketing Research, vol. 20, pp. 21-28.
Beckwith, N.E. (1972), 'Multivariate Analysis of Sales Responses of Competing Brands to Advertising',
Journal ofMarketing Research, vol. 9, pp. 168-176.
Bell, D.E., R.L. Keeney and J.D.C. Little (1975), 'A Market Share Theorem', Journal ofMarketing Research,
vol. 12, pp. 136-141.
Bell, D.R., J. Chiang and V. Padmanabhan (1997), The '84/14/2' Rule Revisited: What Drives Choice,
Incidence and Quantity Elasticities?', Working Paper No. 277, John E. Anderson Graduate School of
Management, UCLA.
Bell, D.R., J. Chiang and V. Padmanabhan (1999), 'The Decomposition of Promotional Response: An
Empirical Generalization', Marketing Science, forthcoming.
Bell, D.R., T. Ho and C.S. Tang (1998), 'Determining Where to Shop: Fixed and Variable Costs of Shopping',
Journal ofMarketing Research, vol. 35, pp. 352-369.
Belsley, D.A. (1991), Conditioning Diagnostics: Co/linearity and Weak Data in Regression, John Wiley &
Sons, New York.
Belsley, D.A., E. Kuh and R.E. Welsh ( 1980), Regression Diagnostics: IdentifYing Influential Data and Sources
of Co/linearity, John Wiley & Sons, New York.
Bemmaor, A. C. ( 1984), 'Testing Alternative Econometric Models on the Existence of Advertising Threshold
Effect', Journal ofMarketing Research, vol. 21, pp. 298-308.
Bemmaor, A.C., Ph.H. Franses and J. Kippers (1999), 'Estimating the Impact of Displays and other
Merchandising Support on Retail Brand Sales: Partial Pooling with Examples', Marketing Letters, vol. 10,
pp. 87-100.
Bemmaor, A.C. and D. Mouchoux (1991), 'Measuring the Short Term Effect of In-store Promotion and Retail
Advertising on Brand Sales: A Factorial Experiment', Journal ofMarketing Research, vol. 28, pp.
202-214.
Ben-Akiva, M. and S.R. Lerman (1985), Discrete Choice Analysis: Theory and Application to Travel Demand,
MIT Press, Cambridge, Mass.
Benbasat, I. and P. Todd ( 1996), 'The Effects of Decision Support and Task Contingencies on Model
Formulation: A Cognitive Perspective', Decision Support Systems, vol. 17, pp. 241-252.
Benson, P.G., S.P. Curley and F.G. Smith (1995), 'Belief Assessment: An Underdeveloped Pbase of Probability
Elicitation', Management Science, vol. 41, pp. 1639-1653.
Bentler, P.M. (1990), 'Comparative Fit Indices in Structural Models', Psychological Bulleti.ot, vol. 107, pp.
238-246.
Bera, A.K. and C.M. Jarque (1981), 'An Efficient Large-Sample Test for Normality of Observations and
Regression Residuals', Australian National University Working Papers in Econometrics, No. 40,
Canberra.
Bera, A.K. and C.M. Jarque (1982), 'Model Specification Tests: A Simultaneous Approach', Journal of
Econometrics, vol. 20, pp. 59-82.
Berndt, E.R. (1991), The Practice of Econometrics, Addison-Wesley, Reading, MA.
Bertrand, J. (1883), 'Theorie Mathematique de 1a Richesse Sociale', J. Savants, pp. 499-508.
Bijmolt, T.H.A. ( 1996), Multidimensional Scaling in Marketing: Towards Integrating Data Collection and
Analysis, Unpublished Ph.D. thesis, University ofGroningen, the Netherlands
Blackburn, J.D. and K.J. Clancy (1980), 'LITMUS: A New Product Planning Model' in Leone, R.P. (ed.),
Proceedings: Market Measurement and Analysis, The Institute of Management Sciences, Providence, RI,
pp. 182-193.
Blattberg, R.C., R. Briesch and E.J. Fox (1995), 'How Promotions Work', Marketing Science, vol. 14, pp.
Gl22-G132.
BIBLIOGRAPHY 583
Blattberg, R.C. and S.J. Hoch (1990), 'Database Models and Managerial Intuition: 50% Model+ 50%
Manager', Management Science, vol. 36, pp. 887-899.
Blattberg, R.C., B. Kim and J. Ye (1994), 'Large-Scale Databases: The New Marketing Challenge' in
Blattberg, R.C., R. Glazer andJ.D.C. Little (eds.), The Marketing Information Revolution, Harvard
Business School Press, Boston, Massachusetts, pp. 173-203.
Blattberg, R.C. and S.A. Neslin ( 1989), 'Sales Promotions: The Long and the Short of It', Marketing Letters,
vol. I, pp. 81-97.
Blattberg, R.C. and S.A. Neslin (1990), Sales Promotions: Concepts, Methods and Strategies, Prentice-Hall,
Englewood Cliffs, NJ.
Blattberg, R.C. and S.A. Neslin (1993), 'Sales Promotions' in Eliashberg, J. and G.L. Lilien (eds.), Handbooks
in Operations, Research and Management Science, val. 5, Marketing, North-Holland, Amsterdam, pp.
553-609.
Blattberg, R.C. and K.J. Wisniewski (1989), 'Price-induced Patterns of Competition', Marketing Science, vol.
8, pp. 291-309.
Blinkley, J.K. and C.H. Nelson (1990), 'How Much Better is Aggregate Data?', Economics Letters, vol. 32,pp.
137-140.
Bloom, D. (1980), 'Point of Sale Scanners and their Implications for Market Research', Journal of the Market
Research Society, vol. 22. pp. 221-238.
Bockenholt, U. (1993a), 'Estimating Latent Distributions in Recurrent Choice Data', Psychometrica, vol. 58,
pp. 489-509.
Bockenholt, U. ( 1993b), 'A Latent Class Regression Approach for the Analysis of Recurrent Choice Data',
British Journal ofMathematical and Statistical Psychology, vol. 46, pp. 95-118.
Bodapati, A.V. and S. Gupta (1999), 'Recovering Latent Class Segmentation Structure from Store Scanner
Data', Research paper, Kellogg Graduate School of Management, Northwestern University, Evanston, II.
Boer, P.M.C. de and R. Harkema (1983), 'Undersized Samples and Maximum Likelihood Estimation of
Sum-Constrained Linear Models', Report 8331, The Econometric Institute, Erasmus University,
Rotterdam.
Boer, P.M.C. de, R. Harkema and A.J. Soede (1996), 'Maximum Likelihood Estimation of Market Share
Models with Large Numbers of Shares', Applied Economics Letters, vol. 3, pp. 45-48.
Boerkamp, E.J .C. ( 1995), Assessing Professional Services Quality, Unpublished Ph.D. thesis, University of
Groningen, the Netherlands.
Bollen, K.A. (1989), Structural Equations With Latent Variables, John Wiley & Sons, New York.
Bolton, R.N. (1998), 'A Dynamic Model of the Duration of the Customer's Relationship with a Continuous
Service Provider: The Role of Satisfaction', Marketing Science, vol. 17, pp. 45-65.
Bolton, R.N. and K.N. Lemon (1999), 'A Dynamic Model of Customers' Usage of Services: Usage as an
Antecedent and Consequence of Satisfilction', Journal ofMarketing Research, vol. 36, pp. 171-186.
Boudjellaba, H., J.M. Dufour and R. Roy (1992), 'Testing Causality Between Two Vectors in Multivariate
Autoregressive Moving Average Model', Journal of the American Statistical Association, vol. 87, pp.
1082-1090.
Boulding, W., A. Kalra, R. Staelin and V.A. Zeithaml (1993), 'A Dynamic Process Model of Service Quality:
From Expectations to Behavioral Intentions', Journal ofMarketing Research, vol. 30, pp. 7-27.
Bowman, E.H. (1967), 'Consistency and Optimality in Managerial Decision Making' in Bowman, E.H. and
R.B. Fetter (eds.), Analysis for Production and Operations Management, Richard D. Irwin, Homewood,
Illinois.
Box, G.E.P. and D.R. Cox ( 1964), 'An Analysis of Transformations', Journal ofthe Royal Statistical Society.
Series B, vol. 26, pp. 211-252.
Box, G.E.P., M. Jenkins and G. C. Reinsel (1994), 1ime Series Analysis: Forecasting and Control,
Prentice-Hall, Englewood Cliffs, NJ.
Box, G.E.P. and G.C. Tiao (1975), 'Intervention Analysis with Applications to Economic and Environmental
Problems', Journal qf the American Statistical Association, vol. 70, March, pp. 70-79.
Boyd, H. W. and W.F. Massy ( 1972), Marketing Management, Harcourt, Brace, Jovanovich, New York.
Bozdogan, H. (1987), 'Model Selection and Akaike's Information Criterion (AIC): The General Theory and its
Analytical Extensions', Psychometrika, vol. 52, pp. 345-370.
Brand, M.J. ( 1993), J!;ffectiveness qfthe Industrial Marketing Mix: An Assessment Through Simulation ofthe
Organizational Buying Process, Unpublished Ph.D. thesis, University of Groningen, the Netherlands.
Brand, M.J. and P.S.H. Leeflang (1994), 'Research on Modeling Industrial Markets' in Laurent, G., G.L. Lilien
and B. Pras (eds.), Research Traditions in Marketing, Kluwer Publishers, Boston, pp. 231-261.
584 BIBLIOGRAPHY
Briesch, R.A., P.K. Chintagunta and R.L. Matzkin ( 1997), 'Nonparametric and Semiparametric Models of
Brand Choice Behavior', Working Paper, Department of Marketing, University of Texas, Austin.
Broadbent, S. (1981), Advertising WorleY, Holt, Rinehart, and Winston, London.
Broadbent, S. (1983}, Advertising WorleY 2, Holt, Rinehart, and Winston, London.
Brobst, R. and R. Gates (1977), 'Comments on Pooling Issues and Methods in Regression Analysis', Journal
ofMarketing Research, vol. 14, pp. 598-606.
Brockhoff, K. (1967}, 'A Test for the Product Life Cycle', Econometrica, vol. 35, pp. 472-484.
Brodie, R.J., A. Bonfrer andJ. Cutler (1996}, 'Do Managers Overreact To Each Others' Promotional Activity?
Further Empirical Evidence',lnternational Journal ofResearch in Marketing, vol. 13, pp. 379-387.
Brodie, R.J., P.J. Danaher, V. Kumar and P.S.H. Leeftang (2000}, 'Principles to Guide the Forecasting of
Market Share' in Armstrong, J.S. (ed.}, The Forecasting Principles Handbook, forthcoming.
Brodie, R.J. and C.A. de Kluyver (1984}, 'Attraction versus Linear and Multiplicative Market Share Models:
An Empirical Evaluation', Journal of Marketing Research, vol. 21, pp. 194-201.
Brodie, R.J. and C.A. de Kluyver (1987), 'A Comparison of the Short-term Forecasting Accuracy of
Econometric and Naive Extrapolation Models of Market Share', International Journal qfForecasting, vol.
3, pp. 423-437.
Bronnenberg, B.J. (1998}, 'Advertising Frequency Decisions in a Discrete Markov Process Under a Budget
Constraint', Journal ofMarketing Research, vol. 35, pp. 399-406.
Bronnenberg, B.J. and L. Wathieu (1996), 'Asymmetric Promotion Effects and Brand Positioning', Marketing
Science, vol. 15, pp. 379-394.
Brown, A. and A. Deaton (1972), 'Surveys in Applied Economics: Models of Consumer Behaviour', The
Economic Journal, vol. 82, pp. 1145-1236.
Brown, A.A., F.L. Hulswit and J.D. Ketelle (1956), 'A Study of Sales Operations', Operations Research, vol. 4,
pp. 296-308.
Brown, B. and 0. Helmer ( 1964),/mproving the Reliability ofEstimates Obtained from a Consensus of
Experts, P.2986, The Rand Corporation Santa Monica, California.
Brown, W.M. and W.T. Tucker ( 1961 ), 'Vanishing Shelf Space', Atlanta Economic Review, vol. 9, pp. 9-13.
Browne, G.J., S.P. Curley and P.G. Benson (1997), 'Evoking Information in Probability Assessment:
Knowledge Maps and Reasoning-Based Directed Questions', Management Science, vol. 43, pp. 1-14.
Bruggen, G.H. van (1993) The Effectiveness ofMarketing Management Support Systems, Unpublished Ph.D.
thesis, Erasmus University, Rotterdam, the Netherlands.
Bruggen, G.H. van, A. Smidts and B. Wierenga (1998), 'Improving Decision Making by Means of a Marketing
Decision Support System', Management Science, vol. 44, pp. 645-658.
Bucklin, R.E. and S. Gupta (1992), 'Brand Choice, Purchase Incidence and Segmentation: An Integrated
Approach', Journal ofMarketing Research, vol. 29, pp. 20 l-215.
Bucklin, R.E. and S. Gupta (2000), 'Commercial Use ofUPC Scanner Data: Industry and Academic
Perspectives', Marketing Science, vol. 19, furthcoming.
Bucklin, R.E. and J .M. Lattin ( 1991 ), 'A Two-State Model of Purchase Incidence and Brand Choice',
Marketing Science, vol. 10, pp. 24-39.
Bucklin, R.E., D.R. Lehmann and J.D.C. Little (1998), From Decision Support to Decision Automation: A
2020 Vison, Marketing Letters, vol. 9, pp. 234-246.
Bult, J.R. (1993a), Thrget Selection for Direct Marketing, Unpublished Ph.D. thesis, University ofGroningen,
the Netherlands.
Butt, J.R. (1993b), 'Semiparametric Versus Parametric Classification Models: An Application to Direct
Marketing, Journal of Marketing Research, vol. 30, pp. 380-390.
Bult, J.R., P.S.H. Leeflang and D.R. Wittink (1997), 'The Relative Performance of Bivariate Causality Tests in
Small Samples', European Journal of Operational Research, vol. 97, pp. 450-464.
Bult, J.R. and T.J. Wansbeek (1995), 'Optimal Selection for Direct Mail', Marketing Science, vol. 14, pp.
378-394.
Bult, J.R. and D.R. Wittink (1996), 'Estimating and Validating Asymmetric Heterogeneous Loss Functions
Applied to Health Care Fund Raising', International Journal ofResearch in Marketing, vol. 13, pp.
215-226.
Bultez, A.V. (1975), Competitive Strategy for Interrelated Markets, Unpublished Ph.D. thesis, Louvain
University, Belgium.
Bultez, A.V. (ed.) (1995), Scientific Report 1992-1995, CREER, Centre for Research on the Economic
Efficiency ofRetailing, FUCAM, Facultc!s Universitaires Catholiques de Mons, Belgium.
BIBLIOGRAPHY 585
Bultez, A.V., E. Gijsbrechts and P.A. Naert (1995), 'A Theorem on the Optimal Margin Mix', Zeitschriftfiir
Betriebswirtschafl, EH 4, December, pp. 151-174.
Bultez, A.V., E. Gijsbrechts, P.A. Naert and P. Vanden Abeele ( 1989), 'Asymmetric Cannibalism in Retail
Assortments',Journal qfRetailing, vol. 65, pp. 153-192.
Bultez, A. V. and P.A. Naert ( 1973), 'Estimating Gravitational Marketing Share Models', Working Paper No.
73-36, European Institute for Advanced Studies in Management, Brussels, Belgium.
Bultez, A.V. and P.A. Naert (1975), 'Consistent Sum-Constrained Models', Journal of the American Statistical
Association, vol. 70, pp. 529-535.
Bultez, A.V. and P.A. Naert (1979), 'Does Lag Structure Really Matter in Optimizing Advertising
Expenditures?', Management Science, vol. 25, pp. 454-465.
Bultez, A.V. and P.A. Naert (1988a), 'When Does Lag Structure Really Matter... Indeed?', Management
Science, vol. 34, pp. 909-916.
Bultez, A.V. and P.A. Naert (1988b), 'SH.A.R.P: Shelf Allocation for Retailers' Profit', Marketing Science,
vol. 7, pp. 211-231.
Bunn, D. and G. Wright (1991), 'Interaction of Judgment and Statistical Forecasting Methods: Issues and
Analysis', Management Science, vol. 37, pp. 501-518.
Burke, R.R. (1991), 'Reasoning with Empirical Marketing Knowledge', International Journal ofResearch in
Marketing, vol. 8, pp. 75-90.
Burke, R.R., A. Rangaswamy, Y. Wind andJ. Eliashberg (1990), 'A Knowledge Based System for Advertising
Design', Marketing Science, vol. 9, pp. 212-229.
Buzzell, R.D. (1964), Mathematical Models and Marketing Management, Division of Research, Graduate
School of Business Administration, Harvard University, Bostun.
Buzzell, R.D. and B.T. Gale (1987), The PIMS-Principles, The Free Press, New York:
Capon, N. and J. Hulbert (1975), 'Decision Systems Analysis in Industrial Marketing', Industrial Marketing
Management, vol. 4, pp. 143-160.
Carman, J.M. (1966}, 'Brand Switching and Linear Learning Models', Journal ofAdvertising Research, vol. 6,
pp. 23-31.
Carpenter, G.S. (1987), 'Modeling Competitive Marketing Strategies: The Impact of Marketing-Mix
Relationships and Industry Structure', Marketing Science, vol. 6, pp. 208-221.
Carpenter, G.S., L.G. Cooper, D.M. Hanssens and D.F. Midgley ( 1988}, 'Modelling Asymmetric Competition',
Marketing Science, vol. 7, pp. 393-412.
Carpenter, G.S. and D.R. Lehmann (1985}, 'A Model of Marketing Mix, Brand Switching and Competition',
Journal ofMarketing Research, vol. 22, pp. 318-329.
Cats-Baril, W.L. and G.P. Huber (1987), 'Decision Support Systems for Ill-structured Problems: An Empirical
Study', Decision Sciences, vol. 18. pp. 35Q-372.
Chakravarti, D., A. Mitchell and R. Staelin (1981), 'Judgment Based Marketing Decision Models: Problems
and Possible Solutions,' Journal of Marketing, vol. 45, Fall, pp. 13-23.
Channon, C. ( 1985), Advertising Works 3, Holt, Rinehart, and Winston, London.
Channon, C. ( 1987}, Advertising Works 4, Cassell Educational Limited, London.
Chatfield, C., A.S.C. Ehrenberg and G.J. Goodhardt (1966}, 'Progress on a Simplified Model of Stationary
Purchasing Behavior', Journal ofRoyal Statistical Society, Series A, vol. 79, pp. 317-367.
Chatfield, C. and G.J. Goodhardt (1973}, 'A Consumer Purchasing Model with Erlang Interpurchase Times',
Journal qfthe American Statistical Association, vol. 68, pp. 828-835.
Chatteljee, R. and J. Eliashberg (1990), 'The Innovation Diffusion Process in a Heterogeneous Population: A
Micromodelling Approach', Management Science, vol. 36, pp. 1057-1079.
Chen, Y., V. Kanetkar and D.L. Weiss (1994), 'Forecasting Market Share with Disaggregate or Pooled Data: A
Comparison of Attraction Models', International Journal ofForecasting, vol. 10, pp. 263-276.
Chen, M.J ., K.G. Smith and C.M. Grimm (1992}, 'Action Characteristics as Predicturs of Competitive
Response', Management Science, vol. 38, pp. 439-455.
Chiang, J. (1991), 'A Simultaneous Approach to the Whether, What and How Much to Buy Questions',
Marketing Science, voi.IO, pp. 297-315.
Chiang, J ., S. Chib and C. Narasimhan (1999), 'Markov Chain Monte Carlo and Models of Consideration Set
and Parameter Heterogeneity', Journal ofEconometrics, vol. 89, pp. 223-248.
Chintagunta, P.K. (1992a), 'Estimating a Multinomial Probit Model of Brand Choice Using the Method of
Simulated Moments', Marketing Science, vol. II, pp. 386-407.
586 BIBLIOGRAPHY
Chintagunta, P.K. (1992b), 'Heterogeneity in Nested Logit Models: An Estimation Approach and Empirical
Results', International Journal ofResearch in Marketing, vol. 9, pp. 161-175.
Chintagunta, P.K. (1993a), 'Investigating Purchase Incidence, Brand Choice and Purchase Quantity Decisions
ofHouseho!ds',MarketingScience, vo1.12,pp.184-208.
Chintagunta, P.K. (1993b), 'Investigating the Sensitivity of Equilibrium Profits to Advertising Dynamics and
Competitive Effects', Management Science, vol. 39, pp. 1146-1162.
Chintagunta, P.K. (1998), 'lnettia and Variety Seeking in a Model of Brand-Purchase Timing', Marketing
Science, vol. 17, pp. 253-270.
Chintagunta, P.K. (1999), 'Variety Seeking, Purchase Timing and the 'Lighting Bolt' Brand Choice Model',
Management Science, vol. 45, pp. 486-498.
Chintagunta, P.K., D.C. Jain and N.J. Vilcassim (1991), 'Investigating Heterogeneity in Brand Preferences in
Logit Models for Panel Data', Journal ofMarketing Research, vol. 28, pp. 417-428.
Chintagunta, P.K. and V.R. Rao (1996), 'Pricing Strategies in a Dynamic Duopoly: A Differential Game
Mode!',Management Science, vol. 42, pp. 1501-1514.
Chintagunta, P.K. and N.J. Vilcassim (1994), 'Marketing Investment Decisions in a Dynamic Duopoly: A
Model and Empirical Analysis', International Journal ofResearch in Marketing, vol. II, pp. 287-306.
Chow, G.C. (1960), 'Tests of Equality between Sets of Coefficients in Two Linear Regressions', Econometrica,
vol. 28, pp. 591--{i05.
Chow, G.C. (1983), Econometrics, McGraw-Hill, New York.
Christen, M., Sachin Gupta, J.C. Porter, R. Staelin and D.R. Wittink (1997), 'Using Market-Level Data to
Understand Promotion Effects in a Nonlinear Model', Journal ofMarketing Research, vol. 34, pp.
322-334.
Chu, W. and P.S. Desai ( 1995), 'Channel Coordination Mechanisms for Customer Satisfuction', Marketing
Science, vol. 14, pp. 343-359.
Churchill, G.A. (1995), Marketing Research, Methodological Foundations, 6th. ed., The Dryden Press, Fort
Worth.
Churchman, C. W. and A. Schainblatt (I %5), 'The Researcher and the Manager: A Dialectic of
Implementation', Management Science, vol. II, pp. B69-B87.
Clark, B.H. and D.B. Montgomery (1999), 'Managerial Identification of Competitors', Journal ofMarketing,
vol. 63, July, pp. 67-83.
Clarke, D. (1983), 'SYNTEX Laboratories (A)', Harvard Business School, Case number 9-584-033.
Clarke, D.G. ( 1973), 'Sales-Advertising Cross-Elasticities and Advertising Competition', Journal ofMarketing
Research, vol. I 0, pp. 250-261.
Clarke, D.G. (1976), 'Econometric Measurement of the Duration of Advertising Effect on Sales', Journal of
Marketing Research, vol. 13, pp. 345-357.
Clements, K. W. and E.A. Selvanathan ( 1988), 'The Rotterdam Model and its Applications to Marketing',
Marketing Science, vol. 7, pp. 60-75.
Cleveland, W.S. (1979), 'Robust Locally Weighted Regression and Smoothing Scatterplots', Journal of the
American Statistical Association, vol. 74, pp. 829-836.
Cochrane, D. and G.H. Orcutt (1949), 'Applications of Least Squares Regression Relationships containing
Autocorrelated Error Terms', Journal of the American Statistical Association, vol. 44, pp. 32--{i I.
Colombo, R.A. and D.G. Morisson ( 1989), 'A Brand Switching Model with Implications for Marketing
Strategies', Marketing Science, vol. 8, pp. 89-99.
Conamor, W.S. and T.A. Wilson (1974), Advertising and Market Power, Harvard University Press, Cambridge.
Cooil, B. and J.M. Devinney (1992), 'The Return to Advertising Expenditure', Marketing Letters, vol. 3, pp.
137-145.
Cooper, L.G. (1993), 'Market-Share Models' in Eliashberg, J. and G.L. Lilien (eds.), Handbooks on Operations
Research and Management Science, vol. 5, Marketing, North-Holland, Amsterdam, pp. 259-314.
Cooper, L.G., D. Klapper and A. Inoue (1996), 'Competitive Component Analysis: A New Approach to
Calibrating Asymmetric Market-Share Models', Journal ofMarketing Research, vol. 33, pp. 224-238.
Cooper, L.G. and M. Nakanishi (1988), Market-Share Analysis: Evaluating Competitive Marketing
Effectiveness, Kluwer Academic Publishers, Boston.
Corsgens, M. and P. Doyle (1981 ), 'A Model for Optimizing Retail Space Allocations', Management Science,
vol. 27, pp. 822-833.
Corsgens, M and D. Weinstein (1982), 'Optimal Strategic Business Units Portfolio Analysis' in Zoltners, A.A.
(ed.), Marketing Planning Models, North-Holland Publishing Company, Amsterdam, pp. 141-160.
Cotterill, R. W., W.P. Putsis and R. Dhar ( 1998), 'Assessing the Competitive Interaction Between Private Labels
BIBLIOGRAPHY 587
and National Brands', paper University of Connecticut I London Business School I Yale School of
Management.
Cournot, A. (1838), Recherches sur les Principes Mathematiques de Ia Theorie des Richesses, Paris.
Cowling, K. and J. Cubbin (1971), 'Price, Quality and Advertising Competition: An Econometric Investigation
of the United Kingdom Car Market', Economica, vol. 38, pp. 378-394.
Cox, D.E. and R.E. Good ( 1967), 'How to Build a Marketing Information System', Harvard Business Review,
vol. 45, May-June, pp. 145-154.
Cox, D.R. (1975), 'Partial Likelihood', Biometrika, vol. 62, pp. 269-276.
Cox, Jr, E. (1967), 'Product Life Cycles as Marketing Models', Journal ofMarketing, vol. 57, October, pp.
47-59.
Cready, W.M. (1991), 'Premium Bundling', Economic Inquiry, vol. 29, pp. 173-179.
Crow, L.E., R.W. Olshavsky and J.O. Summers (1980), 'Industrial Buyers' Choice Strategies: A Protocol
Analysis', Journal ofMarketing Research, vol. 17, pp. 34-44.
Curhan, R.C. (1972), 'The Relationship between Shelf Space and Unit Sales in Supermarkets', Journal of
Marketing Research, vol. 9, pp. 406-412.
Currim, I.S. (1982), 'Predictive Testing of Consumer Choice Models Not Subject to Independence of Irrelevant
Alternatives', Journal ofMarketing Research, vol. 19, pp. 208-222.
Currim,l.S., C. B. Weinberg and D.R. Wittink (1981), 'Design of Subscription Programs for a Performing Arts
Series', Journal of Consumer Research, vol. 8, pp. 67-75.
Cyert, R.M. and J.G. March (1963), A Behavioral Theory ofthe Firm, Englewood Cliffs, N.J., Prentice-Hall,
Inc.
Trade Show Effectiveness: A Cross-National Comparison', Journal of Marketing, vol. 61, October, pp.
55-64.
Dekimpe, M.G. and D.M. Hanssens (1995a), 'The Persistence of Marketing Effects on Sales', Marketing
Science, vol. 14, pp. 1-21.
Dekimpe, M.G. and D.M. Hanssens (1995h), 'Empirical Generalizations About Market Evolution and
Stationarity', Marketing Science, vol. 14, pp. Gl09-<Jl21.
Dekimpe, M.G. and D.M. Hanssens (1999), 'Sustained Spending and Persistent Response: A New Look at
Long-Term Marketing Profitability, Journal ofMarketing Research, forthcoming.
Dekimpe, M.G., D.M. Hanssens and J.M. Silva-Risso (1999), 'Long-Run Effects of Price Promotions in
Scanner Markets', Journal ofEconometrics, vol. 89, pp. 269-291.
DeSarbo, W.S. and W.L. Cron (1988), 'A Maximum Likelihood Methodology forClusterwise Linear
Regression', Journal of Classification, vol. 5, pp. 249-282.
Deshpande, R. and H. Gatignon (1994), 'Competitive Analysis', Marketing Letters, vol. 5, pp. 271-287.
Dhar, S.K., D.G. Morrison and J.S. Raju (1996), 'The Effect of Package Coupons on Brand Choice: An
Epilogue on Profits', Marketing Science, vol. 15, pp. 192-203.
Dhrymes, P.J. (1981), Distributed La&J, 2nd ed., North-Holland Publishing Company, Amsterdam.
Diamantopoulos, A. ( 1994), 'Modeling with LISREL: A Guide for the Uninitiated', Journal of Marketing
Management, vol. 10, pp. 105-136.
Dillon, W.R. and S. Gupta (1996), 'A Segment Level Model of Category Volume and Brand Choice',
Marketing Science, vol. 15, pp. 38-59.
Dockner, E. and S. Jergenson ( 1988), 'Optimal Pricing Strategies for New Products in Dynamic Oligopolies',
Marketing Science, vol. 7, pp. 315-334.
Doktor, R.H. and W.F. Hamilton ( 1973), 'Cognitive Style and the Acceptance of Management Science
Recommendations', Management Science, vol. 19, pp. 884-894.
Dorfman, R. and P.O. Steiner (1954), 'Optimal Advertising and Optimal Quality', The American Economic
Review, vol. 44, pp. 826-836.
Doyle, P. (1989), 'Building Successful Brands: The Strategic Options', Journal ofMarketing Management, vol.
5, pp. 77-95.
Doyle, P. and J. Saunders ( 1985), 'The Lead Effect of Marketing Decisions', Journal of Marketing Research,
vol. 22, pp. 54--65.
Doyle, P. and J. Saunders ( 1990), 'Multiproduct Advertising Budgeting', Marketing Science, vol. 9, pp.
97-113.
Duesenberry, J .S. ( 1949), Income, Saving and the Theory of Consumer Behavior, Harvard University Press,
Cambridge, Mass.
Duffy, M. (1996), 'Econometric Studies of Advertising, Advertising Restrictions and Cigarette Demand: A
Survey', International Journal ofAdvertising, vol. 15, pp. 1-23.
Duncan, W.J. ( 1974), 'The Researcher and the Manager: A Comparative View of the Need for Mutual
Understanding', Management Science, vol. 20, pp. 1157-1163.
Dunne, P.M. and H.I. Wolk (1977), 'Marketing Cost Analysis: A Modularized Contribution Approach',
Journal ofMarketing, vol. 41, July, pp. 83-94.
Durbin, J. ( 1970), 'Testing for Serial Correlation in Least-Squares Regression when Some of the Regressors
are Lagged Dependent Variables', Econometrica, vol. 38, pp. 41Q-429.
Durbin, J. and C.S. Watson (1950), 'Testing for Serial Correlation in Least Squares Regression, 1', Biometrika,
vol. 37, pp. 409-428.
Durbin, J. and C.S. Watson ( 1951), 'Testing for Serial Correlation in Least Squares Regression, II', Biometrika,
vol. 38, pp. 159-178.
Dyckman, Th. (1967), 'Management Implementation of Scientific Research: An Attitudinal Study',
Management Science, vol. 13, pp. B612-B619.
Easingwood, C.J. (1987), 'Early Product Life Cycle Forms for Infrequently Purchased Major Products',
International Journal ofResearch in Marketing, vol. 4, pp. 3-9.
East, R. and K. Hanunond ( 1996), 'The Erosion of Repeat-Purchase Loyalty', Marketing Letters, vol. 7, pp.
163-171.
Edelman, F. ( 1965), 'Art and Science of Competitive Bidding', Harvard Business Review, vol. 23, July-August,
pp. 52-66.
Edwards, J .B. and G.H. Orcutt ( 1969), 'Should Estimation Prior to Aggregation be the Rule?', The Review of
Economics and Statistics, vol. 51, pp. 409-420.
BIBLIOGRAPHY 589
Ehrenberg, A. S.C. (1959), 'The Pattern of Consumer Purchases', Applied Statistics, vol. 8, pp. 26-41.
Ehrenberg, A. S.C. ( 1965), 'An Appraisal of Markov Brand Switching Models', Journal of Marketing
Research, vol. 2, pp. 347-363.
Ehrenberg, A.S.C. (1970), 'Models of Fact: Examples from Marketing', Management Science, vol. 16, pp.
435-445.
Ehrenberg, A. S.C. (1972), Repeat Buying, Theory and Applications, North-Holland Publishing Company,
Amsterdam.
Ehrenberg, A. S.C. (1988), Repeat-buying: Facts, Theory and Data, Oxford University Press, New York.
Ehrenberg, A.S.C. ( 1990), 'A Hope for the Future of Statistics: MSoD', The American Statistician, vol. 44, pp.
195-196.
Ehrenberg, A.S.C. (1994), 'Theory or Well-based Results: Which Comes First?' in Laurent, G., G.L. Lilien and
B. Pras (eds.), Research Traditions in Marketing, Kluwer Academic Publishers, Boston, pp. 79-108.
Ehrenberg, A. S.C. (1995), 'Empirical Generalizations, Theory and Method', Marketing Science, vol. 14, pp.
G.2o-G.28.
Eliashberg, J. and G.L. Lilien (1993), 'Mathematical Marketing Models: Some Historical Perspectives and
Future Projections' in Eliashberg, J. and G.L. Lilien (eds.), Handbooks in Operations Research and
Management Science, vol. 5, Marketing, North-Holland, Amsterdam, pp. 3-23.
Eliashberg, J. and R. Steinberg (1993), 'Marketing-Production Joint Decision Making' in Eliashberg, J. and
G.L. Lilien (eds.), Handbooks in Operations Research and Management Science, vol. 5, Marketing,
North-Holland, Amsterdam, pp. 827-880.
Eliason, S.R. (I 993), Maximum Liklihood Estimation, Logic and Practice, Sage Publications, Newbury Park,
London.
Ellis, D.M. (1966), 'Building up a Sequence of Optimum Media Schedules', Operational Research Quarterly,
vol. 17, pp. 413-424.
Elrod, T. and M.P. Keane (1995), 'A Factor Analytic Probit Model for Representing the Market Structure in
Panel Data', Journal of Marketing Research, vol. 32, pp. 1-16.
Engel, J.F., R.D. Blackwell and P.W. Miniard (1995), Consumer Behavior, 8th ed., The Dryden Press,
FortWorth, Texas.
Engle, R.F. and C. W.J. Granger (I 987), 'Cointegration and Error Correction: Representation, Estimation and
Testing', Econometrica, vol. 55, pp. 25 I -276.
Erdem, T. (1996), 'A Dynamic Analysis of Market Structure Based on Panel Data', Marketing Science, vol. IS,
pp. 359-378.
Erickson, G.M. (1981), 'Using Ridge Regression to Estimate Directly Lagged Effects in Marketing', Journal
of the American Statistical Association, vol. 76, pp. 766-773.
Erickson, G.M. (1991), Dynamic Models qfAdvertising Competition, Kluwer Academic Publishers, Boston.
Erickson, G.M. (1993), 'Offensive and Defensive Marketing: Closed-Loop Duopoly Strategies', Marketing
Letters, vol. 4, pp. 285-295.
Erickson, G.M. (1995), 'Advertising Strategies in a Dynamic Oligopoly', Journal of Marketing Research, vol.
32, pp 233-237.
Erickson, G.M. (1997), 'Dynamic Conjectural Variations in a Lanchester Oligopoly', Management Science,
vol. 43, pp. 1603-1608.
Eskin, G.J. (I 975), 'A Case for Test Market Experiments', Journal ofAdvertising Research, vol. 15, April, pp.
27-33.
Eubank, R.L. ( 1988), Spline Smoothing and Nonparametric Regression, Marcel Dekker, New York.
Fader, P.S. and B.G.S. Hardie (1996), 'Modeling Consumer Choice Among SKUs', Journal of Marketing
Research, vol. 33, pp. 442-452.
Fader, P.S. and J .M. Lattin (I 993), 'Accounting for Heterogeneity and Nonstationary in a Cross-Sectional
Model of Consumer Purchase Behaviour', Marketing Science, vol. 12, pp. 304-317.
Fader, P.S. and D.C. Schmittlein (I 993), 'Excess Behavioral Loyalty for High Share Brands: Deviations from
The Dirichlet Model for Repeat Purchasing', Journal qf Marketing Research, vol. 30, pp. 478-493.
Farley, J. U. and H.J. Leavitt ( 1968), 'A Model of the Distribution of Branded Personal Products in Jamaica',
Journal qf Marketing Research, vol. 5, pp. 362-368.
Farley, J.U., D.R. Lehmann and A. Sawyer (1995), 'Empirical Marketing Generalization Using
Meta-Analysis', Marketing Science, vol. 14, pp. G36-G46.
Farley, J.U. and L.W. Ring (1970), 'An Empirical Test of the Howard-Sheth Model of Buyer Behaviour',
Journal ofMarketing Research, vol. 7 pp. 427-438.
590 BIBLIOGRAPHY
Farrar, D.E. and RR. Glauber (1967), 'Multicollinearity in Regression Analysis: The Problem Revisited',
Review ofEconomics and Statistics, vol. 49, pp. 92-107.
Farris, P. W., J. Olver and C.A. de Kluyver ( 1989), 'The Relationship between Distribution and Market Share',
Marketing Science, vol. 8, pp. 107-128.
Farris, P.W., M.E. Parry and K.L. Ailawadi (1992), 'Sbuetural Analysis of Models with Composite Dependent
Variables', Marketing Science, vol. II, pp. 76--94.
Feichtinger, G., RF. Hartl and S.P. Sethi (1994), 'Dynamic Optimal Control Models in Advertising: Recent
Developments', Management Science, vol. 40, pp. 195-226.
Feinberg. F.M. (1992), 'Pulsing Policies for Aggregate Advertising Models', Marketing Science, vol. II, pp.
221-234.
Feldwick, P. (1990),Advertising WorkY 5, Cassell Educational Limited, London.
Feldwick, P. (1991), Advertising WorkY 6, NTC Publications, Henley-on-Thames.
Fiacco, A.V. and G.P. McCormick (1968), Nonlinear Programming: Sequential Unconstrained Minimization
Techniques, John Wiley & Sons, New York.
Fishbein, M. (1967), Attitude and the Prediction ofBehavior: A Behavioral Theory Approach to the Relation
between Beliifs about an Object and the Attitude Towards the Object. Readings in Attitude Theory and
Measurement, John Wiley & Sons, New York.
Fishbein, M. and I. Ajzen ( 1976), Beliefs, Attitude, Intention and Behavior; An Introduction to Theory and
Research, Addison-Wesley Publishing Company, Reading, Mass.
Fisher, F.M. (1970), 'Test of Equality Between Sets of Coefficients in Two Linear Regressions: An Expository
Note', Econometrica, vol. 38, pp. 361-366.
Fisher, R.A. (1950), Contributions to Mathematical Statistics, John Wiley & Sons, New York.
Fitzroy, P.T. (1976), Analytical Methods for Marketing Management, McGraw-Hill Book Company, London.
Fletcher, R. (1980), Practical Methods of Optimization, John Wiley & Sons, New York.
Fletcher, R. and M.J.D. Powell (1963), 'A Rapidly Convergent Descent Method for Minimization',
Computation Journal, vol. 6, pp. 163-168.
Flinn, C. and J. Heckman (1983), 'The Likelihood Function for the Multi-State Multi-Episode Model' in
Advances in Econometrics vol. 2, JAI-Press, Greenwich, pp. 225-231.
Foekens, E.W. (1995), Scanner Data Based Marketing Modelling: Empirical Applications, Unpublished Ph.D.
thesis, University of Groningen, the Netherlands.
Foekens, E.W. and P.S.H. Leeftang ( 1992), 'Comparing Scanner Data with Traditional Store Audit Data',
Scandinavian Business Review, vol. I, pp. 71-85.
Foekens, E.W., P.S.H. Leeftang and D.R. Wittink (1994), 'A Comparison and an Exploration of the Forecasting
Accuracy of a Loglinear Model at Different Levels of Aggregation', International Journal ofForecasting,
vol. 10, pp. 245-261.
Foekens, E.W., P.S.H. Leeftang and D.R. Wittink (1997), 'Hierarchical versus Other Market Share Models for
Markets with Many ltems',lnternational Journal ofResearch in Marketing, vol. 14, pp. 359-378.
Foekens, E.W., P.S.H. Leeftang and D.R. Wittink (1999), 'Varying Parameter Models to Accommodate
Dynamic Promotion Effects', Journal ofEconometrics, vol. 89, pp. 249-268.
Fomell, C. and D.F. Larcker (1981), 'Evaluating Structural Equation Models with Unobservable Variables and
Measurement Error', Journal ofMarketing Research, vol. 18, pp. 39-50.
Fomell, C. and R.T. Rust (1989), 'Incorporating Prior Theory in Covariance Sbueture Analysis: A Bayesian
Approach', Psychometrika, vol. 54, pp. 249-259.
Forrester, J. W. ( 1961 ), Industrial Dynamics, M.1. T. Press, Cambridge. Mass.
Forsythe, A.B. (1972), 'Robust Estimation of Straight Line Regression Coefficients by Minimizing p-th Power
Deviations', Technometrics, vol. 14, pp. 159-166.
Fourt, L.A. and J. W. Woodlock ( 1960), 'Early Predictions of Market Succes for New Grocery Products •,
Journal ofMarketing, vol. 24, October, pp. 31-38.
Frankel, M.R. and L.R. Frankel ( 1977), 'Some Recent Developments in Sample Survey Design', Journal of
Marketing Research, vol. 14, pp. 280-293.
Franses, Ph.H. (1991), 'Primary Demand for Beer in the Netherlands: An Application of ARMAX Model
Specification', Journal ofMarketing Research, vol. 28, pp. 240-245.
Franses, Ph. H. ( 1994), 'Modeling New Product Sales: An Application of Cointegration Analysis',
International Journal for Research in Marketing, vol. II, pp. 491-502.
Franses, Ph.H. (1996), Periodicity and Stochastic Trends in Economic nme Series, Oxford University Press.
Franses, Ph.H. ( 1998), nme Series Models for Business and Economic Forecasting, Cambridge University
Press.
BIBLIOGRAPHY 591
Franses, Ph.H., T. Kloek and A. Lucas (1999), 'Outlier Robust Analysis of Long-run Marketing Effects for
Weekly Scanning Data', Journal of Econometrics, vol. 89, pp. 293-315.
Frenk, J.B.G. and S. Zbang (1997), 'On Purchase Timing Models in Marketing', Report 9720/A, Econometric
Institute, Erasmus University, Rotterdam, the Netherlands.
Friedman, J.W. (1991), Game Theory with Applications to Economics, Oxford University Press, New York.
Friedman, L. (1958), 'Game Theory Models in the Allocation of Advertising Expenditures', Operations
Research, vol. 6, pp. 699-709.
Friedman, M. (1953), Essays in Positive Economics, University of Chicago Press, Chicago.
Friedman, R. (1982), 'Multicollinearity and Ridge Regression', Allgemeines Statistisches Archiv, vol. 66, pp.
120-128.
Fruchter, G.E. and S. Kalish (1997), 'Closed-Loop Advertising Strategies in a Duopoly', Management Science,
vol. 43, pp. 54-63.
Gasmi, F., J.J. Laffont and Q. Vuong (1992), 'Econometric Analysis of Collusive Behavior in a Soft-Drink
Market', Journal ofEconomics and Management Strategy, vol. I, pp. 277-311.
Gatignon, H. (1984), 'Competition as a Moderator of the Effect of Advertising on Sales', Journal of Marketing
Research, vol. 21, pp. 387-398.
Gatignon, H. (1993), 'Marketing-Mix Models' in Eliashberg, J. and G.L. Lilien (eds.), Handbooks in
Operations Research and Management Science, vol. 5, Marketing, North-Holland, Amsterdam, pp.
697-732.
Gatignon, H., E. Anderson and K. Helsen (1989), 'Competitive Reactions to Market Entry: Explaining
Interfirm Differences', Journal ofMarketing Research, vol. 26, pp. 44-45.
Gatignon, H. and D.M. Hanssens ( 1987), 'Modeling Marketing Interactions with Application to Sales force
Effectiveness', Journal ofMarketing Research, vol. 24, pp. 247-257.
Gatignon, H., T.S. Robertson and A.J. Fein (1997), 'Incumbent Defense Strategies Against New Product
Entry', International Journal of Research in Marketing, vol. 14, pp. 163-176.
Gaver, K.M., D. Horsky and C. Narasimhan (1988), 'Invariant Estimators for Market Share Systems and Their
Finite Sample Behavior', Marketing Science, vol. 7, pp. 169-186.
Gensch, D. H., N. Aversa and S.P. Moore ( 1990), 'A Choice-Modeling Market Information System that Enabled
ABB Electric to Expand its Market Share', Inteifaces, vol. 20, pp. 6-25.
Geweke, J., R. Meese and W. Dent (1983), 'Comparing Alternative Tests of Causality in Temporal Systems',
Journal qf Econometrics, vol. 21, pp. 161-194.
Geyskens, 1., J.B.E.M. Steenkamp and N. Kumar ( 1999), 'A Meta-Analysis of Satisfaction in Marketing
Channel Relationships', Journal q(Marketing Research, vol. 36, pp. 223-238.
Ghosh, A., S.A. Neslin and R.W. Shoemaker (1984), 'A Comparison of Market Share Models and Estimation
Procedures', Journal q( Marketing Research, vol. 21, pp. 202-210.
Gijsbrechts, E. and P.A. Naert (1984), 'Towards Hierarchical Linking of Marketing Resource Allocation to
Market Areas and Product Groups', International Journal ofResearch in Marketing, vol. I, pp. 97-116.
Givon, M. and D. Horsky (1990), 'Untangling the Effects of Purchase Reinforcement and Advertising
Carryover', Marketing Science, vol. 9, pp. 171-187.
Golany, B., M. Kress and F.Y. Phillips (1986), 'Estimating Purchase Frequency Distributions with Incomplete
Data' ,International Journal of Research in Marketing, vol. 3, pp. 169-179.
Goldberger, A.S. (1998), Introductory Econometrics, Harvard University Press, Cambridge, MA.
Goldberger, A.S. and C.F. Manski ( 1995), 'Review Article: The Bell Curve by Herrnstein and Muray', Journal
qf Economic Literature, vol. 23, pp. 762-776.
Goldfeld, S.M. and R.E. Quandt (1965), 'Some Tests for Homoscedasticity', Journal of the American
Statistical Association, vol. 60, pp. 539-547.
Goldfeld, S.M. and R.E. Quandt (1972), Nonlinear Methods in Econometrics, North-Holland Publishing
Company, Amsterdam.
Goldfeld, S.M. and R.E. Quandt (eds.) (1976), Studies in Nonlinear Estimation, Ballinger Publishing
Company, Cambridge, Mass.
Goniil, F. and M.Z. Shi (1998), 'Optimal Mailing of Catalogs: A New Methodology Using Estimable
Structural Dynamic Programming Models', Management Science, vol. 44, pp. 1249-1262.
Goniil, F. and K. Srinivasan (1993a), 'Modeling Multiple Sources of Heterogeneity in Multinomial Logit
Models: Methodological and Managerial Issues', Marketing Science, vol. 12, pp. 213-229.
Goniil, F. and K. Srinivasan ( 1993b), 'Consumer Purchase Behaviour in a Frequently Bought Product
Category: Estimation Issues and Managerial Insights From a Hazard Function Model with Heterogeneity',
592 BIBLIOGRAPHY
Haaijer, R., M. Wedel, M. Vriens and T.J. Wansbeek (1998), 'Utility Covariances and Context Effects in
Conjoint MNP Models', Marketing Science, vol. 17, pp. 236-252.
Hagerty, M.R. and V. Srinivasan (1991), 'Comparing the Predictive Powers of Alternative Multiple Regression
Models', Psychometrika, vol. 56, March, pp. 77--85.
Hahn, M. and J .S. Hyun ( 1991 ), 'Advertising Cost Interactions and the Optimality of Pulsing', Management
Science, vol. 37, pp. 157-169.
Hair, J.E., R.E. Anderson, R.L. Tatham and W.C. Black (1995), Multivariate Data Analysis with Readings,
Prentice-Hall, London.
BIBLIOGRAPHY 593
Hamilton, J.D. (1994), Time Series Analysis, Princeton University Press, Princeton, NJ.
Hammond, J.S. (1974), 'The Roles of the Manager and Management Scientist in Successful Implementation',
Sloan Management Review, vol. 16. pp. 1-24.
Hampton, J.M., P.O. Moore and H. Thomas (1973), 'Subjective Probability and Its Measurement',
Memeographic Notes, London Graduate School of Business Studies.
Hanson, W. and R.K. Martin ( 1990), 'Optimal Price Bundling', Management Science, vol. 36, pp. 155-174.
Hanssens, D.M. (1980a), 'Bivariate Time Series Analysis of the Relationship between Advertising and Sales',
Applied Economics, vol. 12, pp. 329-340.
Hanssens, D.M. ( 1980b), 'Market Response, Competitive Behavior and Time Series Analysis', Journal of
Marketing Research, vol. 17, pp. 470-485.
Hanssens, D.M. and L.J. Parsons (1993), 'Econometric and Time-Series Market Response Models' in
Eliashberg, J. and G.L. Lilien (eds.), Handbookr in Operations Research and Management Science, vol. 5,
Marketing, North-Holland, Amsterdam, pp. 409-464.
Hanssens, D.M., L.J. Parsons and R.L. Schultz (1990), Market Response Models: Econometric and Time Series
Analysis, Kluwer Academic Publishers, Boston.
Harary, F. and B. Lipstein (1962), 'The Dynamics of Brand Loyalty: a Markovian Approach', Operations
Research, vol. I 0, pp. 19-40.
Hiirdle, W. ( 1990), Applied Nonparametric Regression, Econometrics Society Monographs, vol. 19, Cambridge
University Press, Cambridge.
Hiirdle, W. and 0. Linton (1994), 'Applied Nonparametric Methods' in Engle, R.F. and D.L. McFadden (eds.),
Handbook ofEconometrics, vol. 4, Elsevier Science B.V., Amsterdam.
Hastie, T.J. and R.J. Tibshirani (1990), Generalized Additive Models, Chapman and Hall, London.
Hartung, P.H. and J.L. Fisher ( 1965) 'Brand Switching and Mathematical Programming in Market Expansion',
Management Science, vol. II, pp. 231-243.
Hashemzadeh, N. and P. Taylor (1988), 'Stock Prices, Money Supply, and Interest Rates, the Question of
Causality', Applied Economics, vol. 20, pp. 1603-1611.
Haugh, L.D. (1976), 'Checking the Independence of Two Covariance Stationary Time Series: A Univariate
Residual Cross Correlation Approach', Journal of the American Statistical Association, vol. 71, pp.
378-385.
Hauser, J.R. and D. Clausing ( 1988), 'The House of Quality', Harvord Business Review, vol. 66, May/June, pp.
63-73.
Hauser, J.R., D.!. Simester and B. Wernerfelt (1996), 'Internal Customers and Internal Suppliers', Journal of
Marketing Research, vol. 33, pp. 268-280.
Hauser, J.R. and B. Wernerfelt (1990), 'An Evaluation Cost Model of Consideration Sets', Journal of
Consumer Research, vol. 17, pp. 393-408.
Hausman, W.H. and D. B. Montgomery ( 1997), 'Market Driven Manufacturing', Journal ofMarket Focused
Management, vol. 2, pp. 27-47.
Haynes, B. and J.T. Rothe (1974), 'Competitive Bidding for Marketing Research Services: Fact or Fiction',
Journal ofMarketing, vol. 38, July, pp. 69-71.
Heeler, R.M. and M.L. Ray ( 1972), 'Measure Validation in Marketing', Journal of Marketing Research, vol. 9,
pp. 361-370.
Heerde, H.J. van (1999), Models for Sales Promotion Effects Based on Store-level Scanner Data, Unpublished
Ph.D. thesis, Universtiy of Groningen, the Netherlands.
Heerde, H.J. van, P.S.H. Leeflang and D.R. Wittink (1997), 'Semiparametric Analysis of the Deal Effect
Curve', Working paper, first version, FacuJty of Economics, University of Groningen, the Netherlands.
Heerde, H.J. van, P.S.H. Leeflang and D.R. Wittink (1999a), 'Semiparametric Analysis of the Deal Effect
Curve', Working paper, FacuJty of Economics, University ofGroningen, the Netherlands.
Heerde, H.J. van, P.S.H. Leeflang and D.R. Wittink (J999b), 'Decomposing the Sales Effect of Promotions
with Store-Level Scanner Data', Working paper, Faculty ofEconomics, University ofGroningen, the
Netherlands.
Heerde, H.J. van, P.S.H. Leeflang and D.R. Wittink (1999c), 'The Estimation of Pre- and Postpromotion Dips
with Store-Level Scanner Data', Journal of Marketing Research, forthcoming.
Helmer, 0. (1966), Social Technology, Basic Books, New York.
Helsen, K. and D.C. Schmittlein (1993), 'Analysing Duration Times in Marketing: Evidence for the
Effectiveness of Hazard Models', Marketing Science, vol. II, pp. 395-414.
Hendry, D.F. (1989), PG-GIVE: An Interactive Econometric Modeling System, University of Oxford, Oxford.
594 BIBLIOGRAPHY
Herniter, J.D. ( 1971 ), 'A Probabilistic Market Model of Purchasing Timing and Brand Selection', Mrmagement
Science, vol. 18, pp. 102-113.
Herniter, J.D. and R.A. Howard (1964), 'Stochastic Marketing Models' in Hertz, D.B. and R.T. Eddison (eds.),
Progress in Operations Research, John Wiley & Sons, New York, vol. 2, pp. 33-96.
Hess, J .E. ( 1968), 'Transfer Pricing in a Decentralized Firm', Management Science, pp. B31 O-B331
Hildreth, C. andJ.Y. Lu (1960), 'Demand Relations with Autocorrelated Disturbances', Michigan State
University, Agricultural Experiment Station, Technical Bulletin 276, East Lansing, Mich.
Hinich, M.J. and P.P. Talwar (1975), 'A Simple Method for Robust Regression', Journal of the American
Statistical Association, vol. 70, pp. 113-119.
Hoch, S.J., B. Kim, A.L. Montgomery and P.E. Rossi (1995), 'Determinants of Store-Level Price Elasticity',
Journal ofMarketing Research, vol. 32, pp. 17-29.
Hocking, R.R. (1976), 'The Analysis and Selection of Variables in Linear Regression', Biometrics, vol. 32, pp.
1-49.
Hoekstra, J.C. (1987), Handelen van Heroinegebruikers: Ejfecten van Beleidsmaatregelen, Unpublished Ph.D.
thesis, University of Groningen, the Netherlands.
Hoekstra, J.C., P.S.H. Leeflang and D.R. Wittink ( 1999), 'The Customer Concept', Journal ofMarket Focussed
Management, vol. 4, pp. 43-75.
Hoerl, A.E. and R.W. Kennard (1970), 'Ridge Regression: Applications to Nonorthogonal Problems',
Technometrics, vol. 12, pp. 69-82.
Hofstede, F. ter, J.B.E.M. Steenkamp and M. Wedel (1999), 'International Market Segmentation Based on
Consumer-Product Relations', Journal ofMarketing Research, vol. 36, pp. 1-17.
Hogarth, R. ( 1987), Judgment and Choice 2nd ed., John Wiley & Sons, New York.
Hooley, G.J. andJ. Saunders (1993), Competitive Positioning, Prentice-Hall, New York.
Horowitz, I. (1970), Decision Making and the Theory of the Firm, New York, Holt, Rinehart and Winston, Inc.
Horsky, D. ( 1977a), 'Market Share Response to Advertising: An Example of Theory Testing', Journal of
Marketing Research, vol. 14, pp. 10-21.
Horsky, D. (1977b), 'An Empirical Analysis of the Optimal Advertising Policy', Management Science, vol. 23,
pp. 1037-1049.
Horsky, D. and L.S. Simon (1983), 'Advertising and the Diffusion of New Products', Marketing Science, vol.
2,pp.l-10.
Houston, F.S. (1977), 'An Econometric Analysis of Positioning', Journal ofBusiness Administration, vol. 9,
pp. 1-12.
Houston, F.S. and D.L. Weiss ( 1974), 'An Analysis of Competitive Market Behavior', Journal ofMarketing
Research, vol. II, pp. 151-155.
Howard, J.A. and W.M. Morgenroth (1968), 'Information Processing Model of Executive Decisions',
Management Science, vol14, pp. 416-428.
Howard, J .A. and J .N. Sheth (1969), The Theory ofBuyer Behavior, John Wiley & Sons, New York.
Howard, R.A. (1963), 'Stochastic Process Models of Consumer Behavior', Journal ofAdvertising Research,
vol. 3, pp. 35-42.
Huber, P.J. ( 1973), 'Robust Regression: Asymptotics, Conjectures and Monte Carlo', The Annals of Statistics,
vol. 1, pp. 799--821.
Hughes, G.D. (1973), Demand Analysis for Marketing Decisions, Homewood, Ill., Richard D. Irwin, Inc.
Hulbert, J.M. and M.E. Toy ( 1977), 'A Strategic Framework for Marketing Control', Journal ofMarketing, vol.
41, April, pp. 12-20.
Huse, E. ( 1980), Organizational Development and Change, West Publishing, New York.
Huysmans, J.H. (1970a), 'The Effectiveness of the Cognitive Style Constraint in Implementing Operations
Research Proposals', Management Science, vol. 17, pp. 92-104.
Huysmans, J.H. (1970b), The Implementation of Operations Research, Wiley-Interscience, New York.
Intriligator, M.D., R.G. Bodkin and C. Hsiao (1996), Econometric Models, Techniques and Applications,
Prentice Hall, Upper Saddle River, NJ.
Iversen, G.R. (1984), Bayesian Statistical Inference, Sage Publications, London.
lyer, G. (1998), 'Coordinating Channels Under Price and Nonprice Competion', Marketing Science, vol. 17,
pp. 338-355.
Jain, D.C. and N.J. Vilcassim (1991), 'Investigating Household Purchase Timing Decisions: A Conditional
Hazard Function Approach', Marketing Science, vol. 10, pp. 1-23.
BIBLIOGRAPHY 595
Jain, D.C., N.J. Vilcassim and P.K. Chintagunta (1994}, 'A Random-Coefficients Logit Brand-Choice Model
Applied to Panel Data', Journal ofBusiness and Economic Statistics, vol. 12, pp. 317-328.
Jain, S.C. (1993}, Marketing Planning & Strategy, 4th ed., South-Western Publishing Co., Cincinnati, Ohio.
Jamieson, L. and F.M. Bass (1989}, 'Adjusted Stated Intention Measures to Predict Trial Purchase of New
Products: A Comparison of Models and Methods', Journal ofMarketing Research, vol. 26, pp. 336-345.
Jedidi, K., C. F. Mela and S. Gupta (1999), 'Managing Advertising and Promotions for Long-Run Profitability',
Marketing Science, vol. 18, pp. 1-22.
Jeuland, A.P., F.M. Bass and G.P. Wright (1980), 'A Multibrand Stochastic Model Compounding
Heterogeneous Erlang Timing and Multinomial Choice Process', Operations Research, vol. 28, pp.
255-277.
Johansson, J .K. ( 1973}, 'A Generalized Logistic Function with an Application to the Effect of Advertising',
Journal of the American Statistical Association, vol. 68, pp. 824-827.
Johansson, J .K. ( 1979}, 'Advertising and the S-curve: A New Approach', Journal of Marketing Research, vol.
16, pp. 346-354.
Johnson, E.J. andJ. Payne (1985), 'Effort and Accuracy in Choice', Management Science, April, vol. 31,4, pp.
395-415.
Johnson, E.J. and J .E. Russo ( 1994), 'Competitive Decision Making: Two and a Half Frames', Marketing
Letters, vol. 4, pp. 289-302.
Johnston, J. (1984), Econometric Methods, McGmw-Hill, New York.
Jones, J. (1986), What's in a Name?, Gower, Aldershot.
Jones, J.M. (1973), 'A Composite Heterogeneous Model for Brand Choice Behavior', Management Science,
vol. 19, pp. 499-509.
Jones, J.M. and J.T. Landwehr ( 1988), 'Removing Heterogeneity Bias from Logit Model Estimation',
Marketing Science, vol. 7, pp. 41-59.
Jiireskog, K.G. (1973), 'A General Method for Estimating a Linear Structuml Equation System' in Goldberger,
A.S. and O.D. Duncan (eds.}, Structural Equation Models in Mathematical Psychology, vol. 2, New York:
Seminar, pp. 85-112.
Jiireskog, K.G. (1978), 'Statistical Analysis of Covariance and Correlation Matrices', Psychometrica, vol. 43,
pp 443-477.
Jiireskog, K.G. and D. Siirbom (1989}, LISREL 7: A Guide to the Program and Applications, 2nd ed., SPSS,
Chicago.
Judge, G.G., W.E. Griffiths, R.C. Hill, H. Liitkepohl and T.C. Lee (1985), The Theory and Practice of
Econometrics, 2nd ed., John Wiley & Sons, New York.
Juhl, H.J. and K. Kristensen ( 1989), 'Multiproduct Pricing: A Microeconomic Simplification', International
Journal ofResearch in Marketing, vol. 6, pp. 175-182.
Kadiyali, V. (1996), 'Entry, its Deterrence, and its Accommodation: A Study of the U.S. Photugmphic Film
Industry', Rand Journal ofEconomics, vol. 27, pp. 452-478.
Kadiyali, V., N.J. Vilcassim and P.K. Chintagunta (1996), 'Empirical Analysis of Competitive Product Line
Pricing Decisions: Lead, Follow or More Together?', Journal qf Business, vol. 69, pp. 459-488.
Kadiyali, V., N.J. Vilcassim and P.K. Chintagunta ( 1999), 'Product Line Extensions and Competive Market
Interactions: An Empirical Analysis', Journal ofEconometrics, vol. 89, pp. 339-363.
Kaho, B.E. (1998), 'Dynamic Relationships With Customers: High-Variety Strategies', Journal of the
Acadamy ofMarketing Science, vol. 26, No. I, pp. 45-53.
Kaicker, A. and W.O. Bearden (1995), 'Component versus Bundle Pricing', Journal ofBusiness Research, vol.
33, pp. 231-240.
Kalm, A., S. Rajiv and K. Srinivasan (1998), 'Response to Competitive Entry: A Rationale for Delayed
Defensive Reaction', Marketing Science, vol. 17, pp. 380-405.
Kalwani, M.U., R.J. Meyer and D.O. Morrison ( 1994), 'Benchmarks for Discrete Choice Models', Journal of
Marketing Research, vol. 31, pp. 65-75.
Kalwani, M.U. and D.O. Morrison (1977}, 'A Parsimonious Description of the Hendry System', Management
Science, vol. 23, pp. 467-477.
Kalwani, M.U. and A.J. Silk (1982), 'On the Reliability Validity of Purchase Intention Measures', Marketing
Science, vol. !, pp. 243-286.
Kalwani, M. U. and C.K. Yim ( 1992), 'Consumer Price and Promotion Expectations: An Experimental Study',
Journal of Marketing Research, vol. 29, pp. 90-100.
596 BIBLIOGRAPHY
Kalyanam, K. and T.S. Shively (1998), 'Estimating Irregular Pricing Effects: A Stochastic Spline Regression
Approach', Journal ofMarketing Research, vol. 35, pp. 16-29.
Kamakura, W.A. and S.K. Balasubramanian (1988), 'Long-term View of the Diffusion ofDurables',
International Journal ofResearch in Marketing, vol. 5, pp. 1-13.
Kamakura, W.A., B. Kim and J. Lee (1996), 'Modelling Preference and Structural Heterogeneity in Consumer
Choice', Marketing Science, vol. 15, pp. 152-172.
Kamakura, W.A. and G.J. Russell (1989), 'A Probabilistic Choice Model for Market Segmentation and
Elasticity Structure', Journal ofMarketing Research, vol. 26, pp. 379-390.
Kamakura, W.A. and G.J. Russell ( 1993), 'Measuring Brand Value with Scanner Data', International Journal
ofResearch in Marketing, vol. 10, pp. 9-22.
Kamakura, W.A. and R.K. Srivastava ( 1984), 'Predicting Choice Shares Under Conditions of Brand
Interdependence', Journal ofMarketing Research, vol. 21, pp. 420-434.
Kamakura, W.A. and R.K. Srivastava ( 1986), 'An Ideal Point Probabilistic Choice Model for Heterogeneous
Preferences', Marketing Science, vol. 5, pp. 199-218.
Kamakura, W.A. and M. Wedel (1997), 'Statistical Data Fusion for Cross-Tabulation', Journal ofMarketing
Research, vol. 34, pp. 485-498.
Kamakura, W.A., M. Wedel and J. Agrawal (1994), 'Concomitant Variable Latent Class Models for Conjoint
Analysis' ,International Journal ofResearch in Marketing, vol. 11, pp. 451-464.
Kanetkar, V., C.B. Weinberg and D.L. Weiss (1986), 'Estimating Parameters of the Autocorrelated Current
Effects Model from Temporally Aggregated Data',Journal ofMarketing Research, vol. 23, pp. 379-386.
Kanetkar, V., C.B. Weinberg and D.L. Weiss ( 1992), 'Price Sensitivity and Television Advertising Exposures:
Some Empirical Findings', Marketing Science, vol. 11, pp. 359-371.
Kannan, P.K. and G.P. Wright ( 1991 ), 'Modelling and Testing Structured Markets: A Nested Logit Approach',
Marketing Science, vol. 10, pp. 58-82.
Kapteyn, A., S. van de Geer, H. van de Stadt and T.J. Wansbeek (1997), 'Interdependent Preferences: An
Econometric Analysis', Journal ofApplied Econometrics, vol. 12, pp. 665-686.
Kapteyn, A., T.J. Wansbeek and J. Buyze ( 1980), 'The Dynamics of Preference Formation', Journal Q(
Economic Behavior and Organization, vol. I, pp. 123-157.
Karmarkar, K.S. ( 1996), 'Integrative Research in Marketing and Operations Management', Journal of
Marketing Research, vol. 33, pp. 125-133.
Kass, G.V. (1976), Significance Testing in, and Some Extensions of. Automatic Interaction Detection, Doctoral
Dissertation, University of Witwatersrand, Johannesburg, South Africa.
Kaul, A. and D.R. Wittink (1995), 'Empirical Generalizations about the Impact of Advertising on Price
Sensitivity and Price', Marketing Science, vol. 14, pp. G151-G160.
Kealy, M.J., J.F. Dovidio and M.L. Rockel (1988), 'Accuracy in Valuation is a Matter of Degree', Land
Economics, vol. 64, pp. 158-171.
Kealy, M.J., M. Montgomery and J.F. Dovidio (1990), 'Reliability and Predictive Validity of Contingent
Values: Does the Nature of the Good Matter?', Journal ofEnvironmental Economics and Management,
vol. 19, pp. 244-263.
Keller, K.L. (1993), 'Conceptualizing Measuring and Managing Customer-Based Brand Equity', Journal of
Marketing, vol. 57, January, pp. 1-22.
Kenkel, J.L. (1974), 'Some Small Sample Properties of Durbin's Test for Serial Correlation in Regression
Models Containing Lagged Dependent Variables', Econometrica, vol. 42, pp. 763-769.
Ketellapper, R.H. (1981), 'On Estimating a Consumption Function when Incomes are Subject to Measurement
Errors',Economics Letters, vol. 7,pp. 343-348.
Ketellapper, R.H. (1982), The Impact of Observational Errors on Parameter Estimation in Econometrics,
Unpublished Ph.D. thesis, University of Groningen, the Netherlands.
Kim, B. (1995), 'Incorporating Heterogeneity with Store-Level Aggregate Data', Marketing Letters, vol. 6, pp.
159-169.
Kim, N., E. Bridges and R.K. Srivastava (1999), 'A Simultaneous Model for Innovative Product Category Sales
Diffusion and Competitive Dynamics', International Journal ofResearch in Marketing, vol. 16 pp.
95-111.
Kim, S. Y. and R. Staelin ( 1999), 'Manufucturer Allowances and Retailer Pass-Through Rates in a Competitive
Evironment', Marketing Science, vol. 18, pp. 59-76.
Kimball, G.E. (1957), 'Some Industrial Applications ofMilitary Operations Research Methods', Operations
Research, vol. 5, pp. 201-204.
BIBLIOGRAPHY 597
King, W.R. ( 1967), Quantitative Analysis for Marketing Management, McGraw-Hill Book Company, New
York.
Klein, L.R. ( 1962), An Introduction to Econometrics, Prentice-Hall, Englewood Cliffs, NJ.
Klein, L.R. and Lansing, J .B. ( 1955), 'Decisions to Purchase Consumer Durable Goods', Journal of Marketing,
vol. 20, pp. 109-132.
Kmenta, J. (1971), Elements ofEconometrics, Macmillan Publishing Co., New York.
Koerts, J. and A.P.J. Abrahamse ( 1969), On the Theory and Application of the General Linear Model,
Rotterdam University Press, Rotterdam.
K6r6si, G., L. Matyas and J.P. Szekely (1992), Practical Econometrics, Avebury, Aldershot, England.
Kotler, Ph. (1971 ), Marketing Decision Making: A Model Building Approach, Holt, Rinehart and Winston,
New York.
Kotler, Ph. ( 1997), Marketing Management: Analysis, Planning, Implementation, and Control, 9th edition,
Prentice Hall, Upper Saddle River, NJ.
Koyck, L.M. ( 1954), Distributed Lags and Investment Analysis, North-Holland Publishing Company,
Amsterdam.
Kreps, D.M. and R. Wilson (1982), 'Reputation and Imperfect Information', Journal (!(Economic Theory, vol.
27, pp. 253-279.
Krishna, A. (1991 ), 'Effect of Dealing Patterns on Consumer Perceptions of Deal Frequency and Willingness
to Pay', Journal (!(Marketing Research, vol. 28, pp. 441-451.
Krishna, A. (1992), 'The Normative Impact of Consumer Price Expectations for Multiple Brands on Consumer
Purchase Behavior', Marketing Science, vol. II, pp. 266-286.
Krishna, A. (1994), 'The Impact of Dealing Patterns on Purchase Behavior', Marketing Science, vol. 13, pp.
351-373.
Krishnamurthi, L. and S.P. Raj (1985), 'The Effect of Advertising on Consumer Price Sensitivity', Journal of
Marketing Research, vol. 22, pp. 119-129.
Krishnamurthi, L. and S.P. Raj (1988), 'A Model of Brand Choice and Purchase Quantity Price Sensitivities',
Marketing Science, vol. 7, pp. l-20.
Krishnamurthi, L. and S.P. Raj (1991), 'An Empirical Analysis of the Relationship Between Brand Loyalty and
Consumer Price Elasticity', Marketing Science, vol. 10, pp. 172-183.
Krishnamurthi, L., S.P. Raj and R. Selvam ( 1990), 'Statistical and Managerial Issues in Cross-Sectional
Aggregation', Working paper, Northwestern University.
Krishnamurthi, L. and A. Rangaswamy (1987), 'The Equity Estimator for Marketing Research', Marketing
Science, vol. 6, pp. 336-357.
Krishnan, K.S. and S.K. Gupta ( 1967), 'Mathematical Models for a Duopolistic Market', Management
Science, vol. 13, pp. 568-583.
Krishnan, T.V. and H. Soni (1997), 'Guaranteed Profit Margins: A Demonstration of Retailer Power',
International Journal f!f Research in Marketing, vol. 14, pp. 35-56.
Kristensen, K. (1984), 'Hedonic Theory, Marketing Research, and the Analysis of Complex Goods',
International Journal f!f Research in Marketing, vol. I, pp. 17-36.
Kuehn, A.A. (1961), 'A Model for Budgeting Advertising' in Bass, F.M. and R.D. Buzzell (eds.),
Mathematical Models and Methods in Marketing, Homewood, Ill., Richard D. Irwin, Inc., pp. 315-348.
Kuehn, A.A. ( 1962), 'Consumer Brand Choice as a Learning Process', Journal l!fAdvertising Research, vol. 2,
pp. 10-17.
Kuehn, A.A. and M .J. Hamburger ( 1963 ), 'A Heuristic Program for Locating Warehouses', Management
Science, vol. 9, pp. 643-666.
Kumar, N., L.K. Scheer and J.B.E.M. Steenkamp (1995a), 'The Effects of Supplier Fairness on Vulnerable
Resellers', Journal ofMarketing Research, vol. 32, pp. 54-65.
Kumar, N., L.K. Scheer and J.B.E.M. Steenkamp (l995b), 'The Effects ofPercieved Interdependence on
Dealer Attitudes', Journal of Marketing Research, vol. 32, pp. 348-356.
Kumar, T.K. (1975), 'Multicollinearity in Regression Analysis', The Review of Economics and Statistics, vol.
57, pp. 365-366.
Kumar, V. (1994), 'Forecasting Performance of Market Share Models: An Assessment, Additional Insights,
and Guidelines', International Journal o.f Forecasting, vol.1 0, pp. 295-312.
LaForge, R.W. and D.W. Cravens (1985), 'Empirical and Judgement-based Salesforce Decision Models: A
Comparative Analysis', Decision Sciences, vol. 16, pp. 177-195.
LaForge, R.W., C.W. Lamb jr., D.W. Cravens and W.C. Moncrief III (1989), 'Improving Judgement-Based
598 BIBLIOGRAPHY
Salesforce Decision Model Applications', Journal of the Academy of Marketing Science, vol. 17, pp.
167-177.
La!, R. and C. Narasimhan ( 1996}, 'The Inverse Relationship Between Manufacturer and Retailer Margins: A
Theory', Marketing Science, vol. 15, pp. 113-131.
La!, R. and R. Staelin (1986), 'Salesforce Compensation Plans in Environments with Asymmetric
Information', Marketing Science, vol. 5, pp. I79-198.
Lal, R. andJ.M. Villas-Boas (1998}, 'Price Promotions and Trade Deals with Multiproduct Retailers',
Management Science, vol. 44, pp. 935-949.
Lambin, J.J. ( 1969), 'Measuring the Profitability of Advertising: An Empirical Study', Journal ofIndustrial
Economics, vol. 17, pp. 86-103.
Lambin, J.J. (1970), Modetes et Programmes de Marketing, Presses Universitaires de France, Paris.
Lambin, 1.J. ( 1972a), 'Is Gasoline Advertising Justified?', Journal of Business, vol. 45, pp. 585-619.
Lambin, J.J. (1972b), 'A Computer On-Line Marketing Mix Model', Journal o.f Marketing Research, vol. 9,
pp. 119-126.
Lambin, J.J. (1976}, Advertising, Competition and Market Conduct in Oligopoly over Time, North-Holland
Publishing Company, Amsterdam.
Lambin, J.J ., P.A. Naert and A. V. Bultez (1975), 'Optimal Marketing Behavior in Oligopoly' European
Economic Review, vol. 6, pp. 105-128.
Lancaster, K.M. (1984), 'Brand Advertising Competition and Industry Demand', Journal of Advertising, vol.
13 (4), pp. 19-24.
Larn\che, J.C. (1974), Marketing Managers and Models: A Search for a Better Match, Unpublished Ph.D.
thesis, Stanford University.
Larn\che, J.C. (1975), 'Marketing Managers and Models: A Search for a Better Match', Research Paper Series,
No. 157, INSEAD, Fontainebleau.
Larreche, J.C. and R. Moinpour (1983), 'Managerial Judgement in Marketing: The Concept of Expertise',
Journal ofMarketing Research, vol 20, pp. 110-121.
Larreche, J.C. and D.B. Montgomery ( 1977), 'A Framework for the Comparison of Marketing Models: A
Delphi Study', Journal ofMarketing Research, vol. 14, pp. 487-498.
Larreche, J.C. and V. Srinivasan (1981), 'STRATPORT: A Decision Support System for Strategic Planning',
Journal a,( Marketing, vol. 45, nr. 4, pp. 39-52.
Larreche, J.C. and V. Srinivasan (1982), 'STRATPORT: A Model for the Evaluation and Fonnulation of
Business Portfolio Strategies', Management Science, vol. 28, pp. 979-1001.
Lawrence, M.J., R.H. Edmundson and M.J. O'Connor (1986), 'The Accuracy of Combining Judgmental and
Statistical Forecasts', Management Science, vol. 32. pp. 1521-1532.
Lawrence, R.J. ( 1975), 'Consumer Brand Choice- A Random Walk?', Journal o.f Marketing Research, vol. 12,
pp. 314-324.
Leamer, E. E. (1978), Specification Searches: Ad Hoc Inference with Nonexperimental Data, John Wiley &
Sons, New York.
Lee, A.M. (1962), 'Decision Rules for Media Scheduling: Static Campaigns', Operational Research Quarterly,
vol. 13, pp. 229-241.
Lee, A.M. (1963), 'Decision Rules for Media Scheduling: Dynamic Campaigns', Operational Research
Quarterly, vol. 14, pp. 113-122.
Lee, A.M. and A.J. Burkart ( 1960), 'Some Optimization Problems in Advertising Media', Operations Research
Quarterly, vol. II, pp. 113-122.
Lee, M. ( 1996), Methods ofMoments and Semiparametric Econometrics for Limited Dependent Variable
Models, Springer-Verlag, New York.
Lee, P.M. (1997), Bayesian Statistics, John Wiley & Sons, New York.
Lee, T.C., G.G. Judge and A. Zellner (1970), Estimating the Parameters of the Markov Probability Mode/from
Aggregate Time Series Data, North-Holland Publishing Company, Amsterdam.
Leeflang, P.S.H. ( 1974), Mathematical Models in Marketing, a Survey, the Stage ofDevelopment, Some
Extensions and Applications, H.E. Stenfert Kroese, Leiden.
Leeflang, P.S.H. ( 1975), 'The Allocation of Shelf Space over Article Groups: A Portfolio Problem',
Proceedings ESOMAR-seminar ofProduct Range Policy in Retailing and Co-operation with
Manufacturers, Breukelen, The Netherlands, pp. 37-73.
Leeflang, P.S.H. (1976), 'Marktonderzoek en Marketingmodellen', Marktonderzoek en Consumentengedrag,
Jaarboek Nederlandse Vereniging van Marktonderzoekers, vol. 2, pp. 217-252.
Leeflang, P.S.H. (1977a), 'Organising Market Data for Decision Making through the Development of
BIBLIOGRAPHY 599
Leeflang, P.S.H. and J .C. Reuyl ( 1995), 'Effects of Tobacco Advertising on Tobacco Consumption',
International Business Review, vol. 4, pp. 39-54.
Leeflang, P.S.H. and M. Wedel (1993), 'Information Based Decision Making in Pricing', Proceedings
ESOMARIEMACIAFM Symposium on Information Based Decision Making in Marketing, Paris, 17th-19th
November 1993.
Leeflang, P.S.H. and D.R. Wittink (1992), 'Diagnosing Competitive Reactions Using (Aggregated) Scanner
Data', International Journal of Research in Marketing, vol. 9, pp. 39-57.
Leeflang, P.S.H. and D.R. Wittink (1994), 'Diagnosing Competition: Developments and Findings' in Laurent,
G., G.L. Lilien and B. Pras (eds.), Research Traditions in Marketing, Kluwer Academic Publishers,
Boston, pp. 133-156.
Leeflang, P.S.H. and D.R. Wittink (1996), 'Competitive Reaction Versus Consumer Response: Do Managers
Overreact?', International Journal ofResearch in Marketing, vol. 13, pp. 103-119.
Leeflang, P.S.H. and D.R. Wittink (2000), 'Explaining Competitive Reaction Effects', Research Report, SOM,
Research Institute Systems, Organizations and Management, University of Groningen, the Netherlands,
forthcoming.
Lehmann, E.L. ( 1983), Theory ofPoint Estimation, John Wiley & Sons, New York.
Leigh, T.W. and A.J. Rethans (1984), 'A Script-theoretic Analysis of Industrial Purchasing Behavior', Journal
a_( Marketing, vol. 48, Fall, pp. 22-32.
Leone, R.P. ( 1983), 'Modeling Sales-Advertising Relationships: An Integrated Time-Series-Econometric
Approach', Journal o.f Marketing Research, vol. 20, pp. 291-295.
Leone, R.P. (1995), 'Generalizing What is Known About Temporal Aggregation and Advertising Carryover',
Marketing Science, vol. 14, pp. Gl41-Gl50.
Leone, R.P. and R.L. Schultz (1980), 'A Study of Marketing Generalizations', Journal ofMarketing, vol. 44,
January, pp. 10-18.
Lilien, G.L. (1974a), 'A Modified Linear Learning Model of Buyer Behavior', Management Science, vol. 20,
pp. 1027-1036.
Lilien, G.L. (1974b), 'Application of a Modified Linear Learning Model of Buyer Behavior', Journal of
Marketing Research, vol. II, pp. 279-285.
Lilien, G.L. (1975), 'Model Relativism: A Situational Approach to Model Building', Interfaces, vol. 5, pp.
11-18.
Lilien, G.L. (1979), 'Advisor 2: Modelling the Marketing Mix Decision for Industrial Products', Management
Science, vol. 25, pp. 191-204.
Lilien, G.L. ( 1994), 'Marketing Models: Past, Present and Future' in Laurent, G., G.L. Lilien and B. Pras
(eds.), Research Traditions in Marketing, Kluwer Academic Publishers, Boston, pp. 1-20.
Lilien, G.L. and Ph. Kotler (1983), Marketing Decision Making: A Model-Building Approach, Harper and
Row, London.
Lilien, G.L., Ph. Kotler and K.S. Moorthy (1992), Marketing Models, Prentice-Hall, Englewood Cliffs, NJ.
Lilien, G.L. and A. Rangaswamy (1998), Marketing Engineering, Addison-Wesley, Reading, Mass.
Lindsey, J .K. ( 1996), Parametric Statistical Inference, Clarendon Press, Oxford.
Little, J.D.C. ( 1966), 'A Model of Adaptive Control of Promotional Spending', Operations Research, vol. 17,
pp. 1-35.
Little, J.D.C. ( 1970), 'Models and Managers: The Concept of a Decision Calculus', Management Science, vol.
16, pp. B466--B485.
Little, J.D.C. (1975a), 'BRANDAID: A Marketing-Mix Model, Part 1: Structure', Operations Research, vol.
23, pp. 628~55.
Little, J.D.C. (1975b), 'BRANDAID: A Marketing-Mix Model, Part 2: Implementation, Calibration, and Case
Study', Operations Research, vol. 23, pp. 656~73.
Little, J.D.C. (1979), 'Aggregate Advertising Models: The State of the Art', Operations Research, vol. 27, pp.
629~67.
Little, J.D.C. (1998), 'Integrated Measures of Sales, Merchandising and Distributions', International Journal
of Research in Marketing, vol. 15, pp. 475-485.
Little, J.D.C. and L.M. Lodish (1969), 'A Media Planning Calculus', Operations Research, vol. 17, pp. 1-35.
Little, J.D.C. and L.M. Lodish (1981), 'Commentary on Judgment Based Marketing Decision Models',
Journal of Marketing, vol. 45, Fall, pp. 24-29.
Little, J.D.C., L.M. Lodish, J.R. Hauser and G.L. Urban (1994), 'Commentary' (on Hermann Simon's
'Marketing Science's Pilgrimage to the Ivory Tower') in Laurent, G., G.L. Lilien and B. Pras, Research
Traditions in Marketing, Kluwer Academic Publishers, Boston, pp. 44-51.
BIBLIOGRAPHY 601
Lock, A. (1987), 'Integrating Group Judgments in Subjective Forecasts' in Wright, G. and P. Ayton (eds.),
Judgmental Forecasting, John Wiley & Sons, Great Britain, pp. 109-127.
Lodish, L.M. (1971), 'CALLPLAN: An Interactive Salesman's Call Planning System', Management Science,
vol. 18, pp. B25-B40.
Lodish, L.M. (1981), 'Experience with Decision Calculus Models and Decision Support Systems' in Schultz,
R.L. and A.A. Zoltners (eds.), Marketing Decision Models, North-Holland, New York, pp. 165-182.
Lowsh, L.M. (1982), 'A Marketing Decision Support System for Retailers', Marketing Science, vol. I, pp.
31-56.
Lodish, L.M., M.M. Abraham, S. Kalmenson, J. Livelsberger, B. Lubetkin, B. Richardson and M.E. Stevens
(1995a), 'How T.V. Advertising Works: A Meta Analysis of 389 Real World Split Cable T.V. Advertising
Experiments', Journal ofMarketing Research, vol. 32, pp. 125-139.
Lodish, L.M., M.M. Abraham, J. Livelsberger, B. Lubetkin, B. Richardson and M.E. Stevens (1995b), 'A
Summary of Fifty-five In-market Experimental Estimates of the Long-Term Effect of T.V. Advertising',
Marketing Science, vol. 14, pp. GJ33-G140.
Lodish, L.M., E. Curtis, M. Ness and M.K. Simpson (1988), 'Sales Force Sizing and Deployment Using a
Decision Calculus Model at Syntex Laboratories',lnterfaces, vol. 18, pp. 5-20.
Lodish, L.M., D.B. Montgomery and F.E. Webster (1968), 'A Dynamic Sales Call Policy Model', Working
Paper 329-68, Sloan School of Management, M.J.T. Cambridge.
Logman, M. ( 1995), lntrafirm Marketing-Mix Relationships: Analysis of their Sources and Modeling
Implications, Unpublished Ph.D. thesis, University of Antwerp (UFSIA), Belgium.
Long, S. ( 1983), Covariance Structure Models, An Introduction to LISREL, Sage University Press, London.
Louviere, J.J. and D.A. Hensher (1983), 'Forecasting Consumer Demand for a Unique Cultural Event: An
Approach Based on an Integration of Probabilistic Discrete Choice Models and Experimental Design
Data',Journal of Consumer Research, vol. 10, pp. 348-361.
Louviere, J.J. and G. Woodworth (1983), 'Design and Analysis of Simulated Consumer Choice or Allocation
Experiments: An Approach Based on Aggregate Data', Journal of Marketing Research, vol. 20, pp.
350-367.
Lucas, H. C., M.J. Ginzberg and R.L. Schultz ( 1990), Information Systems Implementation: Testing a
Structural Model, Ablex, Norwood, NJ.
Lucas, R.E. (1976), 'Econometric Policy Evaluation: A Critique' in Brunner, K. and A.H. Metzler (eds.), The
Phillips Curve and Labor Markets, North-Holland, Amsterdam.
Luce, R.D. (1959),1ndividual Choice Behavior: A Theoretical Analysis, John Wiley & Sons, New York.
Luce, R.D. and H. Raiffa (1957), Games and Decisions, John Wiley & Sons, New York.
Luik, J.C. and M.J. Waterson (eds.), (1996), Advertising & Markets, NTC Publications, Oxfordshire, U.K.
Liitkepohl, H. ( 1989), 'Testing for Causation Between two Variables in Higher Dimensional Var Models',
Working paper, University of Kiel, Germany.
MacLachlan, D.L. ( 1972), 'A Model of Intermediate Market Response', Journal of Marketing Research, vol. 9,
pp. 378-384.
Maddala, G.S. ( 1971 ), 'The Use of Variance Components Models in Pooling Cross Section and Time Series
Data', Econometrica, vol. 39, pp, 341-358.
Maffei, R.B. (1960), 'Brand Preferences and Simple Markov Processes', Operations Research, vol. 8, pp.
210-218.
Magat, W.A., J .M. McCann and R.C. Morey ( 1986), 'When Does Lag Structure Really Matter in Optimizing
Advertising Expenditures?', Management Science, vol. 32, pp. 182-193.
Magee, J .F. ( 1953 ), 'The Effect of Promotional Effort on Sales', Journal ofthe Operations Research Society of
America, vol. I, pp. 64-74.
Magrath, A.J. (1988), Marketing Smarts, John Wiley & Sons, Chichester.
Mahajan, V., S.I. Bretschneider andJ.W. Bradford (1980), 'Feedback Approaches to Modeling Structural
Shifts in Market Response', Journal of Marketing, vol. 44, Winter, pp. 71-80.
Mahajan, V., P.E. Green and S.M. Goldberg (1982), 'A Conjoint Model for Measuring Self- and Cross-Price
Demand Relationships', Journal ofMarketing Research, vol. 19, pp. 334-342.
Mahajan, V., A.K. Jain and M. Bergier (1977), 'Parameter Estimation in Marketing Models in the Presence of
Multicollinearity', Journal qfMarketing Research, vol. 14, pp. 586-591.
Mahajan, V. and E. Muller (1986), 'Advertising Pulsing Policies for Generating Awareness for New Products',
Marketing Science, vol. 5, pp. 89-106.
602 BIBLIOGRAPHY
Mahajan, V. and E. Muller (1998), 'When Is It Worthwhile Targeting the Majority Instead of the Innovators in
a New Product Lawtch?', Journal ofMarketing Research, vol. 35, pp. 488-495.
Mahajan, V., E. Muller and F.M. Bass ( 1990), 'New Product Diffusion Models in Marketing: A Review and
Directions for Research', Journal ofMarketing, vol. 54, January, pp. 1-26.
Mahajan, V., E. Muller and F.M. Bass (1993), 'New Product Diffusion Models' in Eliashberg, J. and G.L.
Lilien (eds.), Handbooks in Operation Research and Management Science, vol. 5, Marketing,
North-Holland, Amsterdam, pp. 349-408.
Mahajan, V., E. Muller and S. Sharma (1983), 'An Approach to Repeat-Purchase Diffusion Analysis' in
Murphy, P.E. et al. (eds.), Educator's Coriference Proceedings, American Marketing Association,
Chicago.
Mahajan, V. and R.A. Peterson ( 1985), Models for Innovation DiffUsion, Sage Publications, Beverly Hills.
Mahajan, V., S. Sharma andY. Wind (1984), 'Parameter Estimation in Marketing Models in the Presence of
Influential Response Data: Robust Regression and Applications', Journal ofMarketing Research, vol. 21,
pp. 268-277.
Mahl\ian, V. andY. Wind (1986), Innovation DiffUsion Models ofNew Product Acceptance, Ballinger
Publishing Company, Cambridge, MA.
Mahajan, V. andY. Wind (1988), 'New Product Forecasting Models: Directions for Research and
lmplementation',lnternational Journal ofForecasting, vol. 4, pp. 341-358.
Mahajan, V., Y. Wind and J.W. Bmdford (1982), 'Stochastic Dominance Rules for Product Portfolio Analysis'
inZoltners, A.A. (ed.),Marketing Planning Models, North-Holland Publishing Company, Amsterdam,
pp. 161-183.
Mahmoud, E. (1987), 'The Evaluation of Forecasts' in Makridakis, S. and S.C. Wheelwright (eds.), The
Handbook ofForecasting, John Wiley & Sons, New York, pp. 504-522.
Maines, L. (1996), 'An Experimental Examination of Subjective Forecast Combination',lnternationaf Journal
ofForecasting, vol. 12, pp. 223-233.
Makridakis, S. (1989), 'Why Combining Works?' ,International Journal ofForecasting, vol. 5, pp. 601-603.
Malhotra, N.K. ( 1984), 'The Use of Linear Logit Models in Marketing Research', Journal ofMarketing
Research, vol. 21, pp. 20-31.
Malhotra, N.K. (1996), Marketing Research, 2nd ed., Prentice-Hall International Editions, Upper Saddle River,
NJ.
Mann, D.H. ( 1975), 'Optimal Theoretic Advertising Stock Models: A Generalization Incorpomting the Effects
of Delayed Response from Promotion Expenditures', Management Science, vol. 21, pp. 823-832.
Marc'us, M.L. (1983), 'Power, Politics and MIS Implementation', Communications of the ACM, vol. 26, pp.
430-444.
Marquardt, D.W. (1970), 'Genemlized Inverses, Ridge Regression, Biased Linear Estimation', Technometrics,
vol. 12, pp. 591-612.
Marquardt, D.W. and RD. Snee (1975), 'Ridge Regression in Pmctice', The American Statistician, vol. 29, pp.
3-19.
Marsh, H.W., J.R Balla and R McDonald (1988), 'Goodness-of-Fit Indexes in Confirmatory Factor Analysis:
The Effects of Sample Size', Psychological Bulletin, vol. 103, pp. 391-410.
Marshall, K.T. and RM. Oliver (1995), Decision Making and Forecasting, McGmw-Hill.
Mason, Ch.H. and W.D. Perreault jr. ( 1991), 'Collinearity, Power and Interpretation of Multiple Regression
Analysis' ,Journal ofMarketing Research, vol. 28, pp. 268-280.
Massy, W.F. (1965), 'Principal Component Regression in Explomtory Statistical Research', Journal of the
American Statistical Association, vol. 60, pp. 234-256.
Massy, W.F. (1968), 'Stochastic Models for Monitoring New-Product Introductions' in Bass, F.M., C.W. King
and E.A. Pessernier (eds.), Applications of the Sciences in Marketing Management, John Wiley & Sons,
Inc., New York, pp. 85-111.
Massy, W.F. (1971), 'Statistical Analysis of Relations between Variables' in Aaker, D.A. (ed.), Multivariate
Analysis in Marketing: Theory and Application, Wadsworth Publishing Company, Belmont, Cal.
Massy, W.F., D.B. Montgomery and D.G. Morrison (1970), Stochastic Models ofBuying Behavior, M.l.T.
Press., Cambridge, Mass.
McCann, J.M. and J.P. Gallagher (1990), Expert Systems for Scanner Data Environments, Kluwer Academic
Publishers, Boston.
McCann, J.M., W.G. Lahti and J. Hill (1991), 'The Bmnd Managers Assistant: A Knowlegde-based System
Approach to Bmnd Management', International Journal ofResearch in Marketing, vol. 8, pp. 51-73.
BIBLIOGRAPHY 603
McConnel, D. (1968}, 'Repeat-Purchase Estimation and the Linear Learning Model', Journal ofMarketing
Research, vol. 5, pp. 304--306.
McCullagh, P. and J.A. Neider (1989}, Generalized Linear Models, Chapman and Hall, New York.
McFadden, D. (1974}, 'Conditional Logit Analysis of Qualitative Choice Behavior' in Zarembarka, P. (ed.},
Frontiers in Econometrics, Academic Press, New York.
McFadden, D. (1981}, 'Econometric Models of Probabilistic Choice' in Manski, C.F. and D. McFadden (eds.},
Structural Analysis ofDiscrete Data with Econometric Applications, MIT Press., Cambridge, Mass.
McFadden, D. ( 1986}, 'The Choice Theory Approach to Market Research', Marketing Science, vol. 5, pp.
275-297.
McFadden, D. (1989}, 'A Method of Simulated Moments for the Estimation of Discrete Response Models
Without Numerical Integration', Econometrica, vol. 57, pp. 995-1026.
McFadden, D. and F. Reid ( 1975}, 'Aggregate Travel Demand Forecasting from Disaggregated Behavioral
Models', Transportation Research Record, 534, pp. 24--37.
McGee, V.E. and W.T. Carleton (1970}, 'Piecewise-Regression', Journal of the American Statistical
Association, vol. 65, pp. 1109-1124.
McGuire, T.W., J.U. Farley, R.E. Lucas jr. and W.L. Ring (1968}, 'Estimation and Inference for Linear Models
in which Subsets of the Dependent Variable are Constrained', Journal ofthe American Statistical
Association, vol. 63, pp. 1201-1213.
McGuire, T.W. and D.L. Weiss (1976}, 'Logically Consistent Market Share Models II', Journal ofMarketing
Research, vol. 13, pp. 296--302.
McGuire, W.J. (1969}, 'The Nature of Attitudes and Attitude Change' in Lindzey G. and E. Aronson (eds.},
The Handbook ofSocial Psychology, 2nd ed., Addison-Wesley Publishing Company, Reading, Mass., pp.
136--313.
Mcintyre, S.H. (1982), 'An Experimental Study of the Impact ofJudgment-Based Marketing Models',
Management Science, vol. 28, pp. 17-33.
Mcintyre, S.H., D. D. Achabal and C.M. Miller (1993}, 'Applying Case-Based Reasoning to Forecasting Retail
Sales', Journal ofRetailing, vol. 69, pp. 372-398.
Mcintyre, S.H. and I.S. Currim (1982), 'Evaluation Judgment-Based Marketing Models: Multiple Measures
Comparisons and Findings' in Zoltners, A.A. (ed.}, Marketing Planning Models, vol. 18, North-Holland,
New York, pp. 185-207.
McLachlan, G.J. and K.E. Basford ( 1988), Mixture Models: Inference and Applications to Clustering, Marcel
Dekker, New York.
Mela, C.F., S. Gupta and D.R. Lehmann (1997), 'The Long-Term Impact of Promotion and Advertising on
Consumer Brand Choice', Journal ofMarketing Research, vol. 34, pp. 248-261.
Mela, C.F., S. Gupta and K. Jedidi (1998), 'Assessing Long-term Promotional Influences on Market Structure',
International Journal Q{Research in Marketing, vol. 15, pp. 89-107.
Mesak, H.I. (1992), 'An Aggregate Advertising Pulsing Model with Wearout Effects', Marketing Science, vol.
ll,pp. 310--326.
Metwally, M.M. ( 1980), 'Sales Response to Advertising of Eight Australian Products', Journal ofAdvertising
Research, vol. 20, pp. 59-64.
Metwally, M.M. (1992), 'Escalation Tendencies of Advertising', O:iford Bulletin ofEconomics and Statistics,
vol. 40, pp. 153-163.
Mills, H.D. (1961), 'A Study of Promotional Competition' in Bass, F.M. and R.D. Buzzell (eds.),
Mathematical Models and Methods in Marketing, Richard D. Irwin, Homewood, Ill., pp. 271-288.
Mitchell, A.A., J.E. Russo and D.R. Wittink (1991}, 'Issues in the Development and Use of Expert Systems for
Marketing Decisions', International Journal ofResearch in Marketing, vol. 8, pp. 41-50.
Mizon, G.E. ( 1995), 'A Simple Message for Autocorrelation Correctors: Don't', Journal ofEconometrics, vol.
69, pp. 267-288.
Monroe, K.B. and A.J. Della Bitta ( 1978), 'Models for Pricing Decisions', Journal ofMarketing Research, vol.
15, pp. 413-428.
Montgomery, A.L. (1997), 'Creating Micro-Marketing Pricing Strategies Using Supermarket Scanner Data',
Marketing Science, vol. 16, pp. 315-337.
Montgomery, D.B. ( 1969), 'A Stochastic Response Model with Application to Brand Choice', Management
Science, vol. 15, pp.323-337.
Montgomery, D.B. ( 1973), 'The Outlook for M.I.S.', Journal of Advertising Research, vol. 13, pp. 5-11.
Montgomery, D.B. and A.J. Silk (1972), 'Estimating Dynamic Effects of Market Communications
Expenditures', Management Science, vol. 18, pp. B485-B501.
604 BIBLIOGRAPHY
Montgomery, D.B., A.J. Silk and C.E. Zaragoza (1971), 'A Multiple-Product Sales Force Allocation Model',
Management Science, vol. 18, Part II, pp. P3-P24.
Montgomery, D.B. and G.L. Urban ( 1969), Management Science in Marketing, Prentice-Hall, Englewood
Cliffs, NJ.
Montgomery, D.B. and G.L. Urban (1970), 'Marketing Decision Information Systems: An Emerging View',
Journal ofMarketing Research, vol. 7, pp. 226-234.
Moore, H.L. (1914), Economic Cycles: Their Law and Cause, MacMillan, New York.
Moore, W.L. and D.R. Lehmann (1989), 'A Paired Comparison Nested Logit Model oflndividual Preference
Structure', Journal ofMarketing Research, vol. 30, pp. 171-182.
Moorthy, K.S. (1984), 'Market Segmentation, Self Selection and Product Line Design', Marketing Science,
vol. 3, pp. 288-307.
Moorthy, K.S. (1985), 'Using Game Theory to Model Competition', Journal ofMarketing Research, vol. 12,
pp. 262-282.
Moorthy, K.S. (1988), 'Strategic Decentralization in Channels', Marketing Science, vol. 7, pp. 335-355.
Moorthy, K.S. (1993), 'Competitive Marketing Strategies: Game-Theoretic Models' in Eliashberg, J. and G.L.
Lilien (eds.), Handbooks in Operations Research and Management Science, vol. 5, Marketing,
North-Holland, Amsterdam, pp. 143-192.
Moriarty, M. (1975), 'Cross-Sectional, Time-Series Issues in the Analysis of Marketing Decision Variables',
Journal ofMarketing Research, vol. 12, pp. 142-150.
Morikawa, T. (1989), Incorporating Stated Preference Data in Travel Demand Analysis, Ph.D. thesis,
Department of Civil Engineering, MIT.
Morrison, D.G. (1966), 'Testing Brand Switching Models', Journal q{Marketing Research, vol. 3, pp.
401-409.
Morrison, D.G. (1979), 'Purchase Intentions and Purchase Behavior', Journal ofMarketing, vol. 43, pp. 65-74.
Morrison, D.G. and D.C. Schmittlein (1988), 'Generalizing the NBD Model for Customer Purchases: What
Are the Implications and is it Worth the Effort?, Journal ofBusiness and Economic Statistics, vol. 6, pp.
145-159.
Morwitz, V.G. and D.C. Schmittlein (1992), 'Using Segmentation to Improve Sales Forecasts Based on
Purchase Intent: Which 'Intenders' Actually Buy?, Journal ofMarketing Research, vol. 29, pp. 39 I -405.
Nadaraya, E.A. ( 1964), 'On Estimating Regression', Theory Probability Applications, vol. 9, pp. 141-142.
Naert, P.A. (I 972), 'Observations on Applying Marginal Analysis in Marketing: Part I', Journal ofBusiness
Administration, vol. 4, Winter, pp. 49---67.
Naert, P.A. (1973), 'Observations on Applying Marginal Analysis in Marketing: Part II', Journal qfBusiness
Administration, vol. 4, Spring, pp. 3-14.
Naert, P.A. (1974), 'Should Marketing Models be Robust?', paper presented at IBM Conference on the
Implementation of Marketing Models, Ottignies, Belgium.
Naert, P.A. (l975a), 'The Validation of Macro Models', Proceedings ESOMAR Seminar on Market Modeling
(Part II), Noordwijk aan-Zee, The Netherlands, pp. 17-30.
Naert, P.A. (1975b), 'Parameterization of Marketing Models' in Elliot, K. (ed.), Management Bibliographies &
Reviews, vol. l, MCB Books, Bradford, pp. 125-149.
Naert, P.A. (1977), 'Some Cost-Benefit Considerations in Marketing Model Building' in Topritzhofer, E. (ed).,
Marketing-Neue Ergebnisse aus Forschung and Praxis, Gabler-Verlag, Wiesbaden, Germany.
Naert, P.A. and A.V. Bultez (1973), 'Logically Consistent Market Share Models', Journal ofMarketing
Research, vol. 10, pp. 334-340.
Naert, P.A. and A.V. Bultez (1975), 'A Model of a Distribution Network Aggregate Performance',
Management Science, vol. 21, pp. 1102-1112.
Naert, P.A. and P.S.H. Leeflang (1978), Building Implementable Marketing Models, Martinus Nijhoff, Leiden.
Naert, P.A. and M. Weverbergh ( 1977), 'Multiplicative Models with Zero Entries in the Explanatory Variables',
Working Paper 76-22, Centre for Managerial Economics and Econometrics, UFSIA, University of
Antwerp.
Naert, P.A. and M. Weverbergh (198la), 'On the Prediction Power of Market Share Attration Models', Journal
ofMarketing Research, vol. 18, pp. 146-153.
Naert, P.A. and M. Weverbergh (198lb), 'Subjective Versus Empirical Decision Models' in Schultz, R.L. and
A.A. Zoltners (eds.), Marketing Decision Models, North Holland, New York, pp. 99-123.
Naert, P.A. and M. Weverbergh ( 1985), 'Market Share Specification, Estimation and Validation: Toward
Reconciling Seemingly Divergent Views', Journal ofMarketing Research, vol. 22, pp. 453-461.
BIBLIOGRAPHY 605
Naik, P.A., M.K. Mantrala and A. G. Sawyer (1998), 'Planning Media Schedules in the Presence of Dynamic
Advertising Quality', Marketing Science, vol. 17, pp. 214-235.
Nakanishi, M. (1972), 'Measurement of Sales Promotion Effect at the Retail Level: A New Approach',
Working Paper, Graduate School of Management, UCLA.
Nakanishi, M. and L.G. Cooper (1974), 'Parameter Estimation for a Multiplicative Competitive Interaction
Model- Least Squares Approach', Journal ofMarketing Research, vol. 11, pp. 303-311.
Nakanishi, M. and L.G. Cooper (1982), 'Simplified Estimation Procedures for MCI Models', Marketing
Science, vol. I, pp. 314-322.
Narasimhan, C. (1988), 'Competitive Promotional Stratagies', Journal ofBusiness, vol. 61, pp. 427-449.
Narasimhan, C. and S.K. Sen ( 1983), 'New Product Models for Test Marketing Data', Journal~( Marketing,
vol. 47, Winter, pp. 11-24.
Nash, J. (1950), 'Equilibrium Points inn-person Games', Proceedings ~(the National Academy ~(Sciences,
vol. 36, pp. 48-49.
Neider, J.A. and R.W.M. Wedderburn (1972), 'Generalized Linear Models', Journal of the Royal Statistical
Society, Al35, pp. 370-384.
Nenning, M., E. Topritzhofer and U. Wagner (1979), 'Zur Kompatibilitiit altemativer kommerziell verfiigbarer
Datenquellen fiir die Marktreaktionsmodellierung: Die Verwendung von Prewhitening-Filtem und
Kreuzspektanalyse sowie ihre Konsequenzen fiir die Analyse betriebswirtschaftlicher Daten', Zeitschrift
fur Betriebswirtschqft, vol. 49, pp. 281-297.
Nerlove, M. (1971), 'Further Evidence on the Estimation of Dynamic Economic Relations from a Time Series
of Cross Sections', Econometrica, vol. 39, pp. 359-382.
Neslin, S.A., S.G. Powell and L.G. Schneider Stone (1995), 'The Effects of Retailer and Consumer Response
on Optimal Manufacturer Advertising and Trade Promotion Strategies', Management Science, vol. 41, pp.
749-766.
Neslin, S.A. and L.G. Schneider Stone (1996), 'Consumer Inventory Sensitivity and the Postpromotion Dip',
Marketing Letters, vol. 7, pp. 77-94.
Neslin, S.A. and R.W. Shoemaker (1983), 'A Model for Evaluating the Profitability of Coupon Promotions',
Marketing Science, vol.2, pp. 361-388.
Newcomb, S. (1886), 'A Generalized Theory of the Combination of Observations so as to Obtain the Best
Result', American Journal C!f Mathematics, vol. 8, pp. 343-366.
Newell, A. and H.A. Simon (1972), Human Problem Solving, Prentice-Hall, Englewood Cliffs, NJ.
Nielsen Marketing Research (1988), Met het Oogop de Toekomst, ACNielsen (Nederland) B.V., Diemen.
Nillesen, J.P.H. (1992), Services and Advertising Effectivenes, Unpublished Ph.D thesis, Groningen, the
Netherlands.
Nijkamp, W.G. (1993), New Product Macrofiow Models- Specification and Analysis, Unpublished Ph.D.
thesis, University of Groningen, the Netherlands.
Nooteboom, B. ( 1989), 'Diffusion, Uncertainty and Finn Size', International Journal of Research in
Marketing, vol. 6, pp. 109-128.
Nordin, J.A. (1943), 'Spatial Allocation of Selling Expenses',Journal ofMarketing, vo1.7, pp. 210-219.
Novak, T.P. (1993), 'Log-Linear Trees: Models of Market Structure in Brand Switching Data',Journal of
Marketing Research, vol. 30, pp. 267-287.
NZDH, New Zealand Department of Health, Tonic Substances Board (TSB) (1989), Health or Tobacco: an
End to Tobacco Advertising and Promotion.
Oczkowski, E. and M.A. Farrell ( 1998), 'Discriminating between Measurement Scales Using Non-Nested
Tests and Two-Stage Least Squares Estimators: The Case of Market Orientation', International Journal of
Research in Marketing, vol. 15, pp. 349-366.
Oliva, T.A., R.L. Oliver and I.C. MacMillan (1992), 'A Catastrophe Model for Developing Service Satisfaction
Strategies', Journal (!(Marketing, vol. 56, July, pp. 83-95.
Oliver, R.L. and W.S. DeSarbo (1988), 'Response Determinants in Satisfaction Judgments',Journal ~(
Consumer Research, vol. 14, pp. 495-507.
Padmanabhan, V. and I.P.L. Png (1997), 'Manufacturer's Returns Policies and Retail Competition', Marketing
Science, vol. 16, pp. 81-94.
Paich, M. and J.D. Sterman (1993), 'Boom, Bust and Failures to Learn in Experimental Markets', Marketing
Science, vol. 39, pp. 1439-1458.
Palda, K.S. ( 1964), The Measurement C!f Cumulative Advertising Effects, Prentice-Hall, Englewood Cliffs, NJ.
606 BIBLIOGRAPHY
Pankratz, A. (1991), Forecasting with Dynamic Regression Models, John Wiley & Sons, New York.
Papatla, P. and L. Krishnamurthi (1996), 'Measuring the Dynamic Effects of Promotions on Brand Choice',
Journal ofMarketing Research, vol. 33, pp. 20-35.
Parasuraman, A. and R.L. Day (1977), 'A Management-Oriented Model for Allocating Sales Effort', Journal
ofMarketing Research, vol. 14, pp. 22-32.
Parente, F.J. and J.K. Anderson-Parente (1987), 'Delphi Inquiry Systems' in Wright, G. and P. Ayton (eds.),
Judgmental Forecasting, John Wiley & Sons, New York, pp. 129-156.
Parfitt, J.H. and B.J.K. Collins (1968), 'Use of Consumer Panels for Brand-Share Prediction', Journal of
Marketing Research, vol. 5, pp. 131-145.
Park, S. and M. Hahn (1998), 'Direct Estimation ofBatsell and Polking's Model', Marketing Science, vol. 17,
pp. 170-178.
Parker, P. ( 1992), 'Price Elasticity Dynamics over the Adoption Lifecycle', Journal of Marketing Research,
vol. 29, pp. 358-367.
Parsons, L.J. ( 1975), 'The Product Life Cycle and Time-Varying Advertising Elasticities', Journal of
Marketing Research, vol. 12, pp. 476-480.
Parsons, L.J. andF.M. Bass (1971), 'Optimal Advertising Expenditure Implications of a Simultaneous
Equation Regression Analysis', Operations Research, vol. 19, pp. 822-831.
Parsons, L.J., E. Gijsbrechts, P.S.H. Leeflang and D.R. Wittink (1994), 'Marketing Science, Econometrics, and
Managerial Contributions' in Laurent, G., G.L. Lilien and B. Pras (eds.), Research Traditions in
Marketing, Kluwer, Academic Publisher, Boston, pp. 52-78.
Parsons, L.J. and R.L. Schultz (1976), Marketing Models and Econometric Research, North-Holland
Publishing Company, Amsterdam.
Parsons, L.J. and P. Vanden Abeele ( 1981 ), 'Analysis of Sales Call Effectiveness', Journal of Marketing
Research, vol. 18, pp. 107-113.
Parzen, E. (1962), Stochastic Processes, San Francisco, Holden-Day, Inc.
Payne, J.W., J. Bettman and E.J. Johnson (1988), 'Adaptive Strategy Selection in Decision Making', Journal of
Experimental Psychology: Human Learning, Memory and Cognition, vol. 14, July, pp. 534-552.
Pedrick, J.H. and F.S. Zufryden (1991), 'Evaluating the Impact of Advertising Media Plans: A Model of
Consumer Purchase Dynamics Using Single Source Data', Marketing Science, vol. 10, pp. 111-130.
Peterson, H. ( 1965), 'The Wizard Who Oversimplified: A Fable', The Quarterly Journal of Econometrics,
May, pp. 209-211.
Petty, R.E. and J.T. Cacioppo ( 1986), 'The Elaboration Likelihood Model of Persuasion' in Berkowitz, L. (ed. ),
Advances in Experimental Social Psychology, vol. 19, Academic Press, New York, pp. 123-205.
Pharn, M.T. and G.V. Johar (1996), 'Where's the Deal?: A Process Model of Source Identification in
Promotional Settings', Paper, Columbia University.
Philips, L.D. ( 1987), 'On the Adequacy of Judgmental Forecasts' in Wright G. and P. Ayton (eds.), Judgmental
Forecasting, John Wiley & Sons, New York, pp. 11-30.
Phillips, L.W., D.R. Chang and R.D. Buzzell (1983), 'Product Quality, Cost Postion and Business
Performance: A Test of Some Key Hypotheses', Journal a,( Marketing, vol. 47, Spring, pp. 26-43.
Pierce, D.A. (1977), 'Relationships -and the Lack Thereof- between Economic Time Series, with Special
References to Money and Interest Rates', Journal of the American Statistical Association, vol. 72, pp.
11-22.
Pierce, D.A. and L.D. Haugh (1977), 'Causality in Temporal Systems', Journal ofEconometrics, vol. 5, pp.
265-293.
Pindyck, R.S. and D.L. Rubinfeld (1991), Econometric Models & Economic Forecasts, McGraw-Hill, New
York.
Plat, F.W. (1988), Modelling for Markets: Applications of Advanced Models and Methods for Data Analysis,
Unpublished Ph.D. thesis, University ofGroningen, the Netherlands.
Plat, F.W. and P.S.H. Leeflang (1988), 'Decomposing Sales Elasticities on Segmented Markets', International
Journal ofResearch in Marketing, vol. 5, pp. 303-315.
Ploeg, J. van der (1997), Instrumental Variable Estimation and Group-Asymptotics, SOM Theses on Systems,
Organisations and Management, Groningen, the Netherlands.
Popkowski Leszczyc, P.T.L. and R.C. Rao (1989), 'An Empirical Analysis ofNational and Local Advertising
Eft'ects on Price Elasticity', Marketing Letters, vol. I, pp. 149-160.
Popkowski Leszczyc, P.T.L. and H.J.P. Timmermans (1997), 'Store-Switching Behavior', Marketing Letters,
vol. 8, pp. 193-204.
BIBLIOGRAPHY 607
Poulsen, C.S. (1990}, 'Mixed Markov and Latent Markov Modeling Applied to Brand Choice Behavior',
International Journal ofResearch in Marketing, vol. 7, pp. 5-19.
Powell, 1.L. (1994), 'Estimation ofSemiparametric Models' in Engle, R.F. and D.L. McFadden (eds.},
Handbook ofEconometrics, vol. 4, Elsevier Science B.V., Amsterdam.
Prais, S.1. and H.S. Houthakker ( 1955), The Analysis ofFamily Budgets, Cambridge University Press,
Cambridge.
Prasad, V.K., W.R. Casper and R.1. Schieffer (1984), 'Alternatives to the Traditional Retail Store Audit: A
Field Study', Journal ofMarketing, vol. 48, pp. 54-61.
Prasad, V.K. and L.W. Ring (1976), 'Measuring Sales Effects of some Marketing-Mix Variables and their
Interactions', Journal ofMarketing Research, vol. 13, pp. 391-396.
Pringle, G.L., R.D. Wilson and E.l. Brody ( 1982), 'NEWS: A Decision-Oriented Model for New Product
Analysis and Forecasting', Marketing Science, vol. I, pp. 1-29.
Punj, G.H. and R. Staelin (1978), 'The Choice Process for Graduate Business Schools', Journal ofMarketing
Research, vol. 15, pp. 588-598.
Putsis, W.P., S.K. Balasubramanian, E.H. Kaplan and S.K. Sen (1997), 'Mixing Behavior in Cross-Country
Dilfusion', Marketing Science, vol. 16, pp. 354-369.
Putsis, W.P. and R. Dhar (1998), 'The Many Faces of Competition', Marketing Letters, vol. 9, pp. 269-284.
Putsis, W.P. and R. Dhar (1999), 'Category Expenditure, Promotion and Competitive Market Interactions: Can
Promotions Really Expand the Pie?', Paper London Business School/ Yale School of Management.
Putsis, W.P. and S.K. Sen ( 1999), 'Should NFL Blackouts be Banned', Applied Economics, forthcoming.
Quandt, R.E. (1983), 'Computational Problems and Methods' in Griliches, Z. and M. Intriligator (eds.),
Handbook ofEconometrics, vol. I, North-Holland, Amsterdam.
Raiffa, H. and R. Schlaifer (1961), Applied Statistical Decision Theory, Colonial Press, Boston.
Raju, 1. S. ( 1992), 'The Effect of Price Promotions on Variability in Product Category Sales', Marketing
Science, vol. II, pp. 207-220.
Ramaswamy, V., W.S. DeSarbo, D.J. Reibstein and W.T. Robinson (1993), 'An Empirical Pooling Approach
For Estimating Marketing Mix Elasticies with PIMS Data', Marketing Science, vol. 12, pp. 103-124.
Ramsey, 1.B. (1969), 'Tests for Specification Errors in Classical Linear Least Squares Regression Analysis',
Journal of the Royal Statistical Society, Series B, pp. 350-371.
Ramsey, 1.B. (1974), 'Classical Model Selection Through Specification Tests' in Zarembka, P. (ed.), Frontiers
in Econometrics, Academic Press, New York, pp. 13-47.
Rangan, V.K. (1987), 'The Channel Design Decision: A Model and an Application', Marketing Science, vol. 6,
pp. 156-174.
Rangaswamy, A. (1993), 'Marketing Decision Models: From Linear Programs to Knowledge Based Systems'
in Eliashberg, 1. and G.L. Lilien (eds.), Handbooks on Operations Research and Management Science,
vol. 5, Marketing, North-Holland, Amsterdam, pp. 733-771.
Rangaswamy, A., R.R. Burke and T.A. Oliva (1993), 'Brand Equity and the Extendibility of Brand Names',
International Journal ofResearch in Marketing, vol. 10, pp. 61-75.
Rangaswamy, A., 1. Eliashberg, R.R. Burke andY. Wind (1989), 'Developing Marketing Expert Systems: An
Application to International Negotiations', Journal of Marketing, vol. 53, October, pp. 24-39.
Rangaswamy, A., B.A. Harlam and L.M. Lodish (1991), 'INFER: An Expert System for Automatic Analysis of
Scanner Data', International Journal qfResearch in Marketing, vol. 8, pp. 29-40.
Rangaswamy, A. and L. Krishnamurthi (1991 ), 'Response Function Estimation Using the Equity Estimator',
Journal ofMarketing Research, vol. 28, pp. 72-83.
Rangaswamy, A. and L. Krishnamurthi ( 1995), 'Equity Estimation and Assessing Market Response: A
Rejoinder', Journal ofMarketing Research, vol. 32, pp. 480-485.
Rangaswamy, A., P. Sinha and A.A. Zoltners (1990), 'An Integrated Model-based Approach to Sales Force
Structuring', Marketing Science, vol. 9, pp. 279-298.
Rao, V.R. (1984), 'Pricing Research in Marketing: The State of the Art', Journal qfBusiness, vol. 57, pp.
S39-S60.
Rao, V.R. (1993), 'Pricing Models in Marketing' in Eliashberg, J. and G.L. Lilien (eds.), Handbooks in
Operations Research and Management Science, vol. 5, Marketing, North-Holland, Amsterdam, pp.
517-552.
Rao, V.R. and E.W. McLaughlin (1989), 'Modeling the Decision to Add New Products hy Channel
Intermediaries', Journal qfMarketing, vol. 53, pp. 8Q-88.
608 BIBLIOGRAPHY
Rao, V.R. and L.J. Thomas ( 1973), 'Dynamic Models for Sales Promotion Policies', Operational Research
Quarterly, vol. 24, pp. 403-417.
Rao, V.R., Y. Wind and W.S. DeSarbo (1988), 'A Customized Market Response Model: Development,
Estimation and Empirical Testing', Journal of the Academy of Marketing Science, vol. 16, pp. 128-140.
Reddy, S.K., J .E. Aronson and A. Starn ( 1998), 'SPOT: Scheduling Programs Optimally for Television',
Management Science, vol. 44, pp. 83-102.
Reibstein, D.J. and H. Gatignon (1984), 'Optimal Product Line Pricing: The Influence of Elasticities and Cross
Elasticities',Journa/ of Marketing Research, vol. 21, pp. 259-267.
Reinmuth, J.E. and D.R. Wittink (1974), 'Recursive Models for Forecasting Seasonal Processes', Journal of
Financial and Quantitative Analysis, September 1974, pp. 659-{;84.
Reuyl, J .C. ( 1982), On the Determination ofAdvertising Effectiveness: An Empirical Study of the German
Cigarette Market, H.E. Stenfert Kroese, Leiden.
Roberts, J.H. and J.M. Lattin (1991), 'Development and Testing of a Model of Consideration Set
Composition', Journal ofMarketing Research, vol. 28, pp. 429-441.
Roberts, J.H. and G.L. Lilien (1993), 'Explanatory and Predictive Models of Consumer Behaviour' in
Eliashberg, J. and G.L. Lilien (eds.), Handbooks in Operations Research and Management Science, vol. 5,
Marketing, North-Holland, Amsterdam pp. 27-82.
Robey, D. (1984), 'Conflict Models in Implementation Research' in Schultz, R.L. and M.J. Ginzberg (eds.),
Management Science Implementation, JA! Press, pp. 89-105.
Robinson, W.T. (1988), 'Marketing Mix Reactions to Entry', Marketing Science, vol. 7, pp. 368-385.
Rogers, R. ( 1962), Diffusion ofInnovations, The Free Press, New York.
Rossi, P.E. and G.M. Allenby ( 1994), 'A Bayesian Approach to Estimating Household Parameters', Journal of
Marketing Research, vol. 30, pp. 171-182.
Rossi, P.E., R.E. McCulloch and G.M. Allenby (1996), 'The Value of Purchase History Data in Target
Marketing', Marketing Science, vol. 15, pp. 321-340.
Roy, A., D.M. Hanssens and J .S. Raju ( 1994), 'Competitive Pricing by a Price Leader', Management Science,
vol. 40, pp. 809-823.
Roy, R., P.K. Chintagunta and S. Haldar (1996), 'A Framework for Investigating Habits, 'The Hand of the
Past', and Heterogeneity in Dynamic Brand Choice', Marketing Science, vol. 15, pp. 280-299.
Russell, G.J. (1988), 'Recovering Measures of Advertising Carryover from Aggregate Data: The Role of the
Finn's Decision Behavior', Marketing Science, vol. 7, pp. 252-270.
Russell, G.J. and W.A. Kamakura (1994), 'Understanding Brand Competition Using Micro and Macro Scanner
Data',Journal of Marketing Research, vol. 31, pp. 289-303.
Rust, R.T. (1988), 'Flexible Regression', Journal of Marketing Research, vol. 25, pp. 10-24.
Rust, R.T., C. Lee and E. Valente, jr. (1995), 'Comparing Covariance Structure Models: A General
Methodology', International Journal ofResearch in Marketing, vol. 12, pp. 279-291.
Rust, R.T. and D.C. Schmittlein (1985), 'A Bayesian Cross-Validated Likelihood Method for Comparing
Alternative Specifications of Quantitative Models', Marketing Science, vol. 4, pp. 20-45.
Rust, R.T., D. Simester, R.J. Brodie and V. Nilikant (1995), 'Model Selection Criteria: An Investigation of
Relative Accuracy, Posterior Probabilities, and Combinations of Criteria', Management Science, vol. 41,
pp. 322-333.
Savage, L). (1954), The Foundations of Statistics, John Wiley & Sons, New York.
Scales, L.E. (1985),Introduction to Non-linear Optimization, MacMillan, London.
Scheer van der, H.R. ( 1998), Quantitative Approaches for Profit Maximization in Direct Marketing,
Unpublished Ph.D. thesis, University of Groningen, the Netherlands.
Scheer van der, H.R. and P.S.H. Leeflang (1997), 'Determining the Optimal Frequency of Direct Marketing
Activities for Frequently Purchased Consumer Goods', Research Report 97B45, SOM, Research Institute
Systems, Organisations and Management, University of Groningen, the Netherlands.
Schlaifer, R. ( 1969), Analysis ofDecisions under Uncertainty, McGraw-Hill, New York.
Schmalensee, R.L. ( 1972), The Economics ofAdvertising, North-Holland Publishing Company, Amsterdam.
Schmittlein, D.C., L.C. Cooper and D.G. Morrison (1993), 'Truth in Concentration in the Land of (80170)
Laws',Marketing Science, vol. 12, pp. 167-183.
Schmittlein, D.C., D.G. Morrison and R. Colombo (1987), 'Counting Your Customers: Who Are They and
What Will They Do Next?', Management Science, vol. 33, pp. 1-24.
Schmitz, J.D., G.D. Armstrong and J.D.C. Little (1990), 'Cover Story: Automated News Finding in Marketing'
in: Bolino, L. (ed.), DSS Transactions, TIMS College on Information Systems, Rhode Island, Prov., pp.
BIBLIOGRAPHY 609
46--54.
Schoemaker, P.J.H. (1995), 'Scenario Planning: A Tool for Strategic Thinking', Sloan Management Review,
vol. 36, pp. 25-40.
Schultz, H. ( 1938), The Theory and Measurement of Demand, University of Chicago Press, Chicago.
Schultz, R.L. (1971), 'Market Measurement and Planning with a Simultaneous Equation Model', Journal of
Marketing Research, vol. 8, pp. 153-164.
Schultz, R.L., M.J. Ginzberg and H. C. Lucas (1984), 'A Structural Model of Implementation' in Schultz, R.L.
and M.J. Ginzberg (eds.), Management Science Implementation, GT:JAI Press, Greenwich, pp. 55-87.
Schultz, R.L. and M.D. Henry (1981), 'Implementing Decision Models' in Schultz, R.L. and A.A. Zoltners
(eds.), Marketing Decision Models, North-Holland, New York, pp. 275-296.
Schultz, R.L. and D.P. Slevin ( 1975), 'A Program of Research on Implementation' in Schultz, R.L. and D.P.
Slevin (eds.), Implementing Operations Research/Management Science, American Elsevier Publishing
Company, New York, pp. 31-51.
Schaltz, R.L. and D.P. Slevin ( 1977), 'An Innovation Process Perspective of Implementation', Paper No. 60 I,
Krannert Graduate School of Management, Purdue University.
Schultz, R.L. and D.P. Slevin (1983), 'The Implementation Profile', Inteifaces, vol. 13, pp. 87-92.
Schultz, R.L. and D.R. Wittink ( 1976), 'The Measurements of Industry Advertising Effects', Journal of
Marketing Research, vol. 13, pp. 71-75.
Schultz, R.L. and A.A. Zoltners (eds.) (1981), Marketing Decision Models, North-Holland, New York.
Schwartz, G. (1978), 'Estimating the Dimension of a Model', Annals q(Statistics, vol. 6, pp. 461-464.
Sethi, S.P. (1977), 'Dynamic Optimal Control Models in Advertising: A Survey', SIAM Review, vol. 19, pp.
685-725.
Sethuraman, R., V. Srinivasan and D. Kim ( 1999), 'Asymmetric and Neighborhood Cross-Price Effects: Some
Emprirical Generalizations', Marketing Science, vol. 18, pp. 23-42.
Shakun, M.F. (1966), 'A Dynamic Model for Competitive Marketing in Coupled Markets', Management
Science, vol. 12, pp. 525-530.
Shankar, V. (1997), 'Pioneers' Marketing Mix Reactions to Entry in Different Competitive Game Structures:
Theoretical Analysis and Empiricallllustration', Marketing Science, vol.J6, pp. 271-293.
Shapiro, S.S. and M.B. Wilk (1965), 'An Analysis of Variance Test for Normality', Biometrika, vol. 52, pp.
591-611.
Shanna, S. (1996), Applied Multivariate Techniques, John Wiley & Sons., New York.
Shepard, D. ( 1990), The New Direct Marketing: How to Implement a Profit-driven Database Marketing
Strategy, Business One: Irwin, Homewood, Ill.
Shocker, A.D., M. Ben-Akiva, B. Boccara and P. Nedungadi (1991), 'Consideration Set Influences on
Consumer Decision Making and Choice: Issues, Models and Suggestions', Marketing Letters, vol. 2, pp.
181-198.
Shocker, A.D. and W.G. Hall ( 1986), 'Pretest Market Models: A Critical Evaluation', Journal of Product
Innovation Management, vol. 15, pp. 171-191.
Shocker, A.D., D.W. Stewart and A.J. Zahorik (1990), 'Market Structure Analysis: Practice Problems, and
Promise' in Day G., B. Weitz and R. Wensley (eds.), The Interface q(Marketing and Strategy, JAJ Press,
Greenwich, CT, pp. 9-56.
Shoemaker, R.W. and L.G. Pringle (1980), 'Possible Biases in Parameter Estimation with Store Audit Data',
Journal q(Marketing Research, vol. 17, pp. 91-96.
Shugan, S.M. ( 1987), 'Estimating Brand Positioning Maps Using Scanning Data'. Journal q( Marketing
Research, vol. 24, pp. 1-18.
Sichel, H.S. ( 1982), 'Repeat Buying and the Generalized Inverse Gaussian-Poisson Distribution', Applied
Statistics, vol. 31, pp. 193-204.
Siddarth, S., R.E. Bucklin and D.G. Morrison (1995), 'Making the Cut: Modeling and Analyzing Choice Set
Restriction in Scanner Panel Data', Journal q( Marketing Research, vol. 32, pp. 255-266.
Sikkel, D. and A.W. Hoogendoorn ( 1995), 'Models for Monthly Penetrations with Incomplete Panel Data',
Statistica Neerlandica, vol. 49, pp. 378-391.
Silk, A.J. and G.L. Urban (1978), 'Pre-Test-Market Evaluation of New Packaged Goods: A Model and
Measurement Methodology', Journal of Marketing Research, vol. 15, pp. 171-191.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Monographs on Statistics and
Applied Probability, vol. 26, Chapman & Hall, London.
Simon, C.J. and M.W. Sullivan (1993), 'The Measurement and Determinants of Brand Equity: A Financial
Approach', Marketing Science, vol. 12, pp. 28-52.
610 BIBLIOGRAPHY
Simon, H. (1984), 'Cballanges and New Research Avenues in Marketing Science',lnternational Journal of
Research in Marketing, vol. l, pp. 249-261.
Simon, H. (1994), 'Marketing Science's Pilgrimage to the Ivory Tower' in Laurent, G., G.L. Lilien and B. Pras
(eds.), Research Traditions in Marketing, Kluwer Academic Publishers, Boston, pp. 27-43.
Simon, L.S. and M. Freimer (1970), Analytical Marketing, Harcowt, Brace & World, New York.
Sims, C.A. (1972), 'Money, Income and Causality', American Economic Review, vol. 62, pp. 540-552.
Sims, C.A. (1980), 'Microeconomics and Reality', Econometrica, vol. 48, pp. l-48.
Sinha, R.K. and M. Chandrashekaran (1992), 'A Split Hazard Model for Analysing the Diffusion of
Innovations', Journal ofMarketing Research, vol. 24, pp. 116-127.
Sirohi, N. (1999), Essays on Bundling, Unpublished Ph.D. thesis, Cornell University.
Sirohi, N., E.W. McLaughlin and D.R. Wittink (1998), 'A Model of Consumer Perceptions and Store Loyalty
Intentions for a Supermarket Retailer', Journal ofRetailing, vol. 74, pp. 223-245.
Skiera, B. and S. Albers ( 1998), 'COSTA: Contribution Optimizing Sales Territory Alignment', Marketing
Science, vol.l7,pp.l96-213.
Smee, C., M. Parsonage, R. Anderson and S. Duckworth (1992), Effect of Tobacco Advertising on Tobacco
Consumption: A Discussion Document Reviewing the Evidence, Economics & Operational Research
Division Department of Health, London.
Smith, L.H. (1967), 'Ranking Procedures and Subjective Probability Distribution', Management Science, vol.
14, pp. B236-B249.
Smith, S.A., S.H. Mcintyre and D.O. Achabal (1994), 'A Two-stage Sales Forecasting Procedure Using
Discounted Least Squares', Journal ofMarketing Research, vol. 31, pp. 44--65.
Smith, S.V., R.H. Brien andJ.E. Stafford (1968), 'Marketing Information Systems: An Introductory Overview'
in Smith, S.V., R.H. Brien and J.E. Stafford (eds.), Readings in Marketing !'!formation Systems, Houghton
Mifflin Company, Boston, pp. l-14.
Solow, R.M. (1960), 'On a Family of Lag Distributions', Econometrica, vol. 28, pp. 393-406.
Spring, P., P.S.H. Leeftang and T.J. Wansbeek (1999), 'Simultaneous Target Selection and Offer Segmentation:
A Modeling Approach', Journal ofMarket-Focussed Management, vol4, pp. 187-203.
Srinivasan, V. (1976), 'Decomposition of a Multi-Period Media• Scheduling Model in Terms of Single Period
Equivalents', Management Science, vol. 23, pp. 349-360.
Srinivasan, V. and H.A. Weir ( 1988), 'A Direct Approach for Inferring Micro-Parameters of the Koyck
Advertising-Sales Relationship from Macro Data', Journal ofMarketing Research, vol. 25, pp. 145-156.
Srivastava, R.K., M.l. Alpert and A.D. Shocker ( 1984), 'A Customer-oriented Approach for Determining
Market Stroctures', Journal ofMarketing, vol. 48, 2, pp. 32-45.
Steenkamp, J .B.E.M. ( 1989), Product Quality, Van Gorcum, Assen.
Steenkamp, J.B.E.M. and H. Baumgartner (1998), 'Assessing Measurement Invariance in Cross-National
Cunsumer Research', Journal of Consumer Research, vol. 25, pp. 78-90.
Steenkamp, J.B.E.M. and M.G. Dekimpe (1997), 'The Power of Store Brands: Intrinsic Loyalty and
Conquesting Power', Onderzoeksrapport nr. 9706, Departement Toegepaste Economische
Wetenschappen, Katholieke Universiteit Leuven, Belgic.
Steenkamp, J.B.E.M. and H.C.M. van Trijp (1991), 'The Use ofLISREL in Validating Marketing Constructs',
International Journal for Research in Marketing, vol. 8, pp. 283-300.
Stewart, J. (1991), Econometrics, Philip Allan, Heme! Hemstead.
Styan, G.P.H. and H. Smithjr. (1964), 'Markov Chains Applied to Marketing', Journal ofMarketing Research,
vol. l, pp. 50-55.
Sunde, L. and R.J. Brodie (1993), 'Consumer Evaluations of Brand Extensions: Further Empirical Results',
International Journal ofResearch in Marketing, vol. 10, pp. 47-53.
Swamy, P.A.V.B. (1970), 'Efficient Inference in a Random Coefficient Regression Model', Econometrica, vol.
38, pp. 311-323.
Swamy, P.A.V.B. (1971), Statisticallnforence in Random Col{/ficient Regression Models, Springer-Verlag, New
York.
Swanson, E.B. (1974), 'Management Information Systems: Appreciation and Involvement', Management
Science, vol. 21, pp. 178-188.
Talwar, P. (1974), 'Robust Estimation of Regression Parameters', Unpublished Ph.D. thesis, Carnegie-Mellon
University.
Taylor, C.J. ( 1963), 'Some Developments in the Theory and Application of Media Scheduling Methods',
Operations Research Quarterly, vol. 14, pp. 291-305.
BIBLIOGRAPHY 611
Tellis, G.J. (1988a), 'Advertising Exposure, Brand Loyalty and Brand Purchase: A Two-Stage Model of
Choice', Journal ofMarketing Research, vol. 25, pp. 134-144.
Tellis, G.J. ( 1988b), 'The Price Elasticity of Selective Demand: A Meta-Analysis of Econometric Models of
Sales',Journal ofMarketing Research, vol. 25, pp. 331-341.
Tellis, G.J. and C.M. Crawford (1981), 'An Evolutionary Approach to Product Growth Theory' ,Journal of
Marketing, vol. 45, October, pp. 125-132.
Tellis, G.J. and C. Fomell (1988), 'The Relationship between Advertising and Product Quality over the Product
Life Cycle: A Contingency Theory', Journal ofMarketing Research, vol. 25, pp. 64-71.
Tellis, G.J. and F.S. Zufryden (1995), 'Tackling the Retailer Decision Maze: Which Brands to Discount, How
Much, When and Why?', Marketing Science, vol.l4, pp 271-299.
Telser, L.G. ( 1962a), 'The Demand for Branded Goods as Estimated from Consumer Panel Data', Review of
Economics and Statistics, vol. 44, pp. 300-324.
Telser, L.G. (1962b), 'Advertising and Cigarettes', Journal ofPolitical Economy, vol. 70, pp. 471-499.
Telser, L.G. (1963), 'Least Squares Estimates of Transition Probabilities', in Measurement ofEconomics,
Stanford, Stanford University Press, pp. 270-292.
Teng, J.T. and G.L. Thompson (1983), 'Oligopoly Models for Optimal Advertising when Production Costs
Obey a Learning Curve', Management Science, vol. 29, pp. 1087-1101.
Theil, H. (1965a), 'The Analysis of Disturbances in Regression Analysis', Journal of the American Statistical
Association, vol. 60, pp. 1067-1079.
Theil, H. ( 1965b), Economic Forecasts and Policy, North-Holland Publishing Company, Amsterdam.
Theil, H. (1969), 'A Multinomial Extension of the Linear Logit Model', International Economic Review, vol.
10, pp. 251-259.
Theil, H. ( 1971 ), Principles ofEconometrics, John Wiley & Sons, New York.
Theil, H. (1975), Theory and Measurement of Consumer Demand, vol. I, North-Holland Publishing Company,
Amsterdam.
Theil, H. (1976), Theory and Measurement of Consumer Demand, vol. 2, North-Holland Publishing Company,
Amsterdam.
Theil, H. and A. Schweitzer ( 1961 ), 'The Best Quadratic Estimator of the Residual Variance in Regression
Analysis', Statistica Neerlandica, vol. 15, pp. 19-23.
Thomas, J.J. and K.F. Wallis (1971), 'Seasonal Variation in Regression Analysis', Journal of the Royal
Statistical Society, Series A, vol. 134, pp. 67-72.
Thursby, J .G. and P. Schmidt (1977), 'Some Properties of Tests for Specification Error in a Linear Regression
Model', Journal of the American Statistical Association, vol. 63, pp. 558-582.
Tinbergen, J. ( 1966), Economic Policy: Principles and Design, North-Holland Publishing Company.
Titterington, D.M., A.F.M. Smith and U.E. Makov (1985), Statistical Analysis ofFinite M'!Xture Distributions,
John Wiley & Sons, New York.
Todd, P. and I. Benbasat ( 1994), 'The Influence of Decision Aids on Choice Strategies: An Experimental
Analysis of the Role of Cognitive Effort', Organizational Behavior and Human Decision Processes, vol.
60, pp. 36-74.
Torgerson, W. ( 1959), Theory and Methods ofMeasurement, John Wiley & Sons, New York.
Tsay, R.S. and G.C. Tiao (1984), 'Consistent Estimates of Autoregressive Parameters and Extended Sample
Autocorrelation Function for Stationary and Nonstationary ARMA Models', Journal of the American
Statistical Association, vol. 79, pp. 84-96.
Tversky, A. (1972), 'Elimination by Aspects: A Theory of Choice', Psychological Review, vol. 79, pp.
281-299.
Uncles, M., A.S.C. Ehrenberg and K. Hammond (1995), 'Patterns of Buyer Behavior: Regularities, Models and
Extensions', Marketing Science, vol. 14, pp. G71--G78.
Urban, G.L. (1968), 'A New Product Analysis and Decision Model', Management Science, vol. 14, pp.
490-517.
Urban, G.L. (1969a), 'SPRINTER Mod. II: Basic New Product Analysis Model' in Morin, B.A. (ed.),
Proceedings of the National Coriference of the American Marketing Association, pp. 139-150.
Urban, G.L. (1969b), 'A Mathematical Modeling Approach to Product Line Decisions', Journal ofMarketing
Research, vol. 6, pp. 40-47.
Urban, G.L. (1970), 'SPRINTER Mod. III: A Model for the Analysis of New Frequently Purchased Consumer
Products', Operations Research, vol. 18, pp. 805-854.
612 BIBLIOGRAPHY
Urban, G.L. (1971), 'Advertising Budgeting and Geographical Allocation: A Decision Calculus Approach',
Working Paper 532-71 (revised), Alfred P. Sloan School Management, M.I.T.
Urban, G.L. (1972), 'An Emerging Process of Building Models for Management Decision Makers', Working
PaperNo. 591-72, Alfred P. Sloan School of Management, M.l.T.
Urban, G.L. (1974), 'Building Models for Decision Makers', Interfaces, vol. 4, pp. 1-11.
Urban, G.L. (1993), 'Pretest Market Forecasting' in Eliashberg, J. and G.L. Lilien (eds.), Handbooks in
Operations Research and Management Science, val. 5, Marketing, North-Holland, Amsterdam, pp.
315-348.
Urban, G.L. and J.R. Hauser (1980), Design and Marketing ofNew Products, Englewood Cliffs, Prentice-Hall,
NJ.
Urban, G.L. and J.R. Hauser ( 1993), Design and Marketing New Products, 2nd ed., Prentice-Hall, Englewood
Cliffs, NJ.
Urban, G.L., J.R. Hauser and J.H. Roberts (1990), 'Prelaunch Forecasting of New Automobiles: Models and
Implementation', Management Science, vol. 36, pp. 401-421.
Urban, G.L. and R. Karash ( 1971 ), 'Evolutionary Model Building', Journal ofMarketing Research, vol. 8, pp.
62-{;6.
Urban, G.L. and M. Katz (1983), 'Pre-test-markets Models: Validation and Managerial Implications', Journal
ofMarketing Research, vol. 20, pp. 221-234.
US DHHS, US Department of Health and Human Services ( 1989), Reducing the Health Consequences of
Smoking: Twenty-five Years ofProgress, a Report of the US Surgeon General, DHHS Publication (CDC),
89-8411.
Vanden Abeele, P. ( 1975), An Investigation ofErrors in the Variables on the Estimation ofLinear Models in a
Marketing Context, Unpublished Ph.D. thesis, Stanford University.
Vanden Abeele, P. and E. Gijsbrechts (1991), 'Modeling Aggregate Outcomes of Heterogeneous non-IIA
Choice', EMAC 1991 Annual Conference Proceedings, Michael Smurfit Graduate Business School,
University College, Dublin, pp. 484-508.
Vanden Abeele, P., E. Gijsbrechts and M. Vanhuele (1990), 'Specification and Empirical Evaluation of a
Cluster-Asymmetry Market Share Model', International Journal ofResearch in Marketing, vol. 7, pp.
223-247.
Vanden Bulte, C. and G. L. Lilien (1997), 'Bias and Systematic Change in the Parameter Estimates of
Macro-Level Diffusion Models', Marketing Science, vol. 16, pp. 338-353.
VanderWerf, P.A. and J.F. Mahan (1997), 'Meta Analysis of the Impact of Research Methods on Findings of
First-Mover Advantage', Management Science, vol. 43, pp. 1510-1519.
Vanhonacker, W.R. ( 1988), 'Estimating an Autoregressive Current Effects Model of Sales Response when
Observations are Aggregated over Time: Least Squares versus Maximum Likelihood', Journal of
Marketing Research, vol. 15, pp. 301-307.
Verhoef, P.C., Franses, P.H. andJ.C. Hoekstra (2000), 'The Impact ofSatisfuction on the Breadth of the
Relationship with a Multi-Service Provider', Working Paper, Rotterdam Institute of Business Economic
Studies (RIBES).
Verhulp, J. (1982), The Commercial Optimum, Theory and Application, Unpublished Ph.D. thesis, Erasmus
University, Rotterdam.
Vidale, H.L. and H. B. Wolfe ( 1957), 'An Operations-Research Study of Sales Response to Advertising',
Operations Research, vol. 5, pp. 370-381.
Vilcassim, N.J. (1989), 'Extending the Rotterdam Model to Test Hierarchical Market Structures', Marketing
Science, vol. 8, pp. 181-190.
Vilcassim, N.J. and D.C. Jain (1991), 'Modeling Purchase-Timing and Brand-Switching Behavior
Incorporating Explanatory Variables and Unobserved Heterogeneity', Journal ofMarketing Research, vol.
28, pp. 29-41.
Vilcassim, N.J., V. Kadiyali and P.K. Chintagunta (1999), 'Investigating Dynamic Multifirm Market
Interactions in Price and Advertising', Management Science, vol. 45, pp. 499-518.
Vuong, Q.H. ( 1989), 'Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses', EconQifletrica,
vol. 57, pp. 307-333.
Vyas, N. and A.G. Woodside (1984), 'An Inductive Model of Industrial Supplier Choice Processes', Journal of
Marketing, vol. 48, Winter, pp. 30-45.
Waarts, E., M. Carree and B. Wierenga (1991), 'Full-information Maximum Likelihood Estimation of Brand
BIBLIOGRAPHY 613
Positioning Maps Using Supennarkt Scanning Data', Journal ofMarketing Research, vol. 28, pp.
483-490.
Wallace, T.D. (1972), 'Weaker Criteria and Tests for Linear Restrictions in Regression', Econometrica, v~l. 40,
pp. 689-698.
Walters, R.G. (1991), 'Assessing the Impact of Retail Price Promotions on Product Substitution,
Complementary Purchase, and lnterstore Sales Displacement', Journal ofMarketing, vol. 55, April, pp.
17-28.
Wansbeek, T.W. and M. Wedel (1999), 'Marketing and Economics: Editor's Introduction', Journal of
Econometrics, vol. 189, pp. 1-14.
Watson, G.S. (1964), 'Smooth Regression Analysis', Sankhyii Ser., vol. A 26, pp. 359-372.
Webster, F. E. ( 1992), 'The Changing Role of Marketing in the Corporation', Journal ofMarketing, vol. 56,
October, pp. 1-17.
Webster, F.E. (1994), Market Driven Management, John Wiley & Sons, New York.
Wedel, M. and W.S. DeSarbo (1994), 'A Review of Latent Class Regression Models and Their Applications' in
Bagozzi, R.P. (ed.), Advanced Methods for Marketing Research, Blackwell, Cambridge, pp. 353-388.
Wedel, M. and W.S. DeSarbo (1995), 'A Mixture Likelihood Approach for Generalized Linear Models',
Journal q(Classijication, vol. 12, pp. 1-35.
Wedel, M.. W.S. DeSarbo, J.R. Bult and V. Ramaswamy (1993), 'A Latent Class Poisson Regression Model for
Heterogeneous Count Data', Journal ofApplied Econometrics, vol. 8, pp. 397-411.
Wedel, M. and W.A. Kamakura ( 1998), Market Segmentation: Conceptual and Methodological Foundations,
Kluwer Academic Publishers, Boston.
Wedel, M., W.A. Kamakura, N. Arora, A.C. Bemmaor, J. Chiang, T. Elrod, R. Johnson, P. Lenk, S.A. Neslin
and C.S. Poulsen (1999), 'Discrete and Continuous Representations of Unobserved Heterogeneity in
Choice Modeling', Marketing Letters, vol. 10, pp. 219-232.
Wedel, M., W.A. Kamakura, W.S. DeSarbo and F. Ter Hofstede (1995), 'Implications for Asymmetry,
Nonproportionality, and Heterogeneity in Brand Switching from Piece-wise Exponential Mixture Hazard
Models', Journal ofMarketing Research, vol. 32, pp. 457-463.
Wedel, M. and P.S.H. Leeflang (1998), 'A Model for the Effects of Psychological Pricing in Gabor-Granger
Price Studies', Journal ofEconomic Psychology, vol. 19, pp. 237-260.
Wedel, M., M. Vriens, T.H.A. Bijmolt, W. Krijnen and P.S.H. Leeflang (1998), 'Assessing the Effects of
Abstract Attributes and Brand Familiarity in Conjoint Choice Experiments', International Journal of
Research in Marketing, vol. 15, pp. 71-78.
Weerahandi, S. and S.R. Dalal (1992), 'A Choice Based Approach to the Diffusion of a Service: Forecasting
Fax Penetration by Market Segments', Marketing Science, vol. 11, pp. 39-53.
Weiss, P.L. and P.M. Windal (1980), 'Testing Cumulative Advertising Effects: A Comment on Methodology',
Journal ofMarketing Research, vol. 17, pp. 371-378.
Weverbergh, M. (1976), 'Restrictions on Linear Sum-Constrained Models: A Generalization', Working Paper
No. 76-17, Centre for Managerial Economics and Econometrics, UFSIA, University of Antwerp.
Weverbergh, M. (1977), Competitive Bidding: Games, Decisions, and Cost Uncertainty, Unpublished Ph.D.
thesis, UFSIA, University of Antwerp, Belgium.
Weverbergh, M., Ph.A. Naert and A. Bultez (1977), 'Logically Consistent Market Share Models: A Further
Clarification', Working Paper No. 77-5, European Institute for Advanced Studies in Management,
Brussels, Belgium.
Wheat, R.D. and D.G. Morrisson (1990), 'Estimating Purchase Regularity with 1\vo Interpurchase Times',
Journal ofMarketing Research, vol. 27, pp. 87-93.
White, H. (I 980), 'A Heteroscedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
Heteroscedasticity', Econometrica, vol. 48, pp. 817-838.
Wierenga, B. (1974), An Investigation ofBrand Choice Processes, Rotterdam, Rotterdam University Press.
Wierenga, B. (1978), 'A Least Sqnares Estimation Method for the Linear Learning Model', Journal of
Marketing Research, vol. 15, pp. 145-153.
Wierenga, B. and G.H. van Bruggen (1997), 'The Integration of Marketing Problem Solving Modes and
Marketing Management Support Systems', Journal q(Marketing, vol. 6, July, pp. 21-37.
Wierenga, B. and G.H. van Bruggen (2000), Marketing Management Support Systems: Principles, Tools, and
Implementation, Kluwer Academic Publishers, Boston.
Wilde, D.J. and C.S. Beightler (1967), Foundations of Optimization, Prentice-Hall, Englewood Cliffs, NJ.
Wildt, A.R. ( 1993), 'Equity Estimation and Assessing Market Response', Journal ofMarketing Research, vol.
30, pp. 437-451.
614 BIBLIOGRAPHY
Wildt, A.R. and R.S. Winer (1983), 'Modeling and Estimation in Changing Market Enviromnents', Journal of
Business, vol. 56, pp. 365-388.
Wind, Y. (1981), 'Marketing Oriented Strategic Planning Models' in Schultz, R.L. and A.A. Zoltners (eds.),
Marketing Decision Models, North-Holland, New York, pp. 207-250.
Wind, Y. ( 1982), Product Policy: Concepts, Methods, and Strategy, Addison-Wesley Publishing Company,
Reading, Massachusetts.
Wind, Y., P.E. Green, D. Shifflet and M. Scarbrough ( 1989), 'Courtyard by Marriott: Designing a Hotel Facility
with Consumer-Based Marketing Models', Interfaces, vol. 19, pp. 25-47.
Wind, Y. and G.L. Lilien (1993), 'Marketing Strategy Models' in Eliashberg, J. and G.L. Lilien (eds.),
Handbooks in Operations Research and Management Science, vol. 5, Marketing, North-Holland,
Amsterdam, pp. 773-826.
Wind, Y., V. Mahajan and D.J. Swire (1983), 'An Empirical Comparison of Standardized Portfolio Models',
Journal ofMarketing, vol. 47, no. 2, pp. 89-99.
Winkler, R.L. ( 1967a), 'The Assessment of Prior Distributions in Bayesian Analysis', Journal qf the American
Statistical Association, vol. 62, pp. 776-800.
Winkler, R.L. (1967b), 'The Quantification of Judgment: Some Methodological Suggestions', Journal of the
American Statistical Association, vol. 62, pp. 1105-1120.
Winkler, R.L. (1967c), 'The Quantification of Judgment: Some Experimental Results', Proceedings of the
American Statistical Association, pp. 386-395.
Winkler, R.L. (1968), 'The Consensus of Subjective Probability Distributions', Management Science, vol. 15,
pp. B61-B75.
Winkler, R.L. (1986), 'Expert Resolution', Management Science, vol. 32, pp. 298-303.
Winkler, R.L. (1987), 'Judgmental and Bayesian Forecasting' in Makridakis, S. and S.C. Wheelwright (eds.),
The Handbook ofForecasting, John Wiley & Sons, New York, pp. 248-265.
Wittink, D.R. (1977), 'Explaining Territorial Differences in the Relationship Between Marketing Variables',
Journal of Marketing Research, vol. 14, pp. 145-155.
Wittink, D.R. (1987), 'Causal Market Share Models in Marketing: Neither Forecasting nor Understanding?',
International Journal of Forecasting, vol. 3, pp. 445-448.
Wittink, D.R. (1988), The Application of Regression Analysis, Allyn and Bacon, Boston.
Wittink, D.R., M.J. Addona, W.J. Hawkes and J.C. Porter (1988), 'SCAN*PRO: The Estimation, Validation
and Use of Promotional Effects Based on Scanner Data', Internal paper, Cornell University.
Wittink, D.R. and S.K. Keil (2000), 'Continuous Conjoint Analysis' in Gustafsson, A., A. Herrman and F.
Huber (eds.), Conjoint Measurement: Methods and Applications, Prentice-Hall, Europe, forthcoming.
Wittink, D.R., M. Vriens and W. Burhenne (1994), 'Commercial Use of Conjoint Analysis in Europe: Results
and Critical Reflections', International Journal ofResearch in Marketing, vol. II, pp. 41~52.
Wold, F.M. (1986), Meta Analysis: Quantitative Methods for Research Synthesis, California Sage Press,
Newburg Park.
Wonnacott, R.J. and T.H. Wonnacott (1970), Econometrics, John Wiley & Sons, New York.
Wonnacott, T.H. and R.J. Wonnacott (1969), Introductory Statistics, John Wiley and Sons, New York.
Yi, Y. ( 1989), 'An Investigation of the Structure of Expectancy-Value Attitude and its Implications',
International Journal of Research in Marketing, vol. 6, pp. 71-84.
Yon, B. (1976), Le Comportement Marketing de /'En/reprise: Une Approche Econometrique, Dunod, Paris.
Zaltman, G., C.R.A. Pinson and R. Angehnar (1973), Metatheory and Consumer Research, Holt, Rinehart and
Winston, New York.
Zellner, A. ( 1962), 'An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for
Aggregation Bias', Journal of the American Statistical Association, vol. 57, pp. 348-368.
Zellner, A. (1971), An Introduction to Bayesian !tiference in Econometrics, John Wiley & Sons, New York.
Zellner, A. ( 1979), 'Causality and Econometrics' in Brunner, K. and A. H. Meltzer (eds. ), Three Aspects of
Policy and Policy-making: Knowledge, Data and Institutions, North-Holland, Amsterdam, pp. 9-54.
Zellner, A. and M.M. Geisel (1970), 'Analysis of Distributed Lag Models with Applications to Consumption
Function Estimation', Econometrica, vol. 38, pp. 865-888.
Zenor, M.J. ( 1994), 'The Profit Benefits of Category Management', Journal of Marketing Research, vol. 94,
pp. 202-213.
Zentler, A.P. and D. Ryde ( 1956), 'An Optimum Geographical Distribution of Publicity Expenditure in a
Private Organisation', Management Science, vol. 2, pp. 337-352.
BIBLIOGRAPHY 615
Zmud, R.W. (1979), 'Individual Differences and MIS Success: A Review of the Empirical Literature',
Management Science, vol. 25, pp. 966-979.
Zoltners, A.A. (1976), 'Integer Programming Models for Sales Territory Alignment to Maximize Profit',
Journal ofMarketing Research, vol. 13, pp. 426--430.
Zoltners, A.A. ( 1981 ), 'Nonnative Marketing Models' in Schultz, R.L. and A.A. Zoltners (eds.), Marketing
Decision Models, North-Holland, New York, pp. 55-76.
Zoltners, A.A. (1982), Marketing Planning Models, North-Holland Publishing Company, Amsterdam.
Zoltners, A.A. and P. Siska ( 1983), 'Sales Territory Alignment: A Review and Model', Management Science,
vol. 29, pp. 1237-1256.
Zufiyden, F.S. ( 1981 ), 'A Logit-Markovian Model of Consumer Purchase Behaviour Based on Explanatory
Variables: Empirical Evaluation and Implications for Decision Making', Decision Sciences, vol. 12, pp.
645-660.
Zufiyden, F.S. (1982), 'A General Model for Assessing New Product Marketing Decisions and Market
Performance', TIMS/Studies in the Management Sciences, vol. 18, pp. 63-82.
Zufiyden, F.S. (1986), 'Multibrand Transition Probabilities as a Function of Explanatory Variables: Estimation
by a Least Squares Approach', Journal ofMarketing Research, vol. 23, pp. 177-183.
Zwart, P.S. (1983), Beslissingsprocessen van Delililiisten een Toepassing in de Drogistenbranche, Unpublished
Ph.D. thesis, Universtiy of Groningen, the Netherlands.
Author Index
617
618 AUTHOR INDEX
Bult, J.R. and T.J. Wansbeek (1995), Chatfield, C., A. S.C. Ehrenberg and G.J.
552,553 Goodhardt ( 1966), 222
Bult, J.R. and D.R. Wittink (1996), 158, Chatfield, C. and G.J. Goodhardt (1973),
552 224,226
Bultez, A.V. (1975), 149, 170 Chattetjee, R. and J. Eliashberg (1990),
Bultez, A.V. (1995), 262 187
Bultez, A.V., E. Gijsbrechts and P.A. Chen, Y., V. Kanetkar and D.L. Weiss
Naert (1995), 259 (1994), 135
Bultez, A.V., E. Gijsbrechts, P.A. Naert Chen, M.J., K.G. Smith and C.M. Grimm
and P. Vanden Abeele ( 1989), 260, (1992), 211
264,290 Chiang, J. (1991), 103
Bultez, A.V. and P.A. Naert (1973), 321, Chiang, J., S. Chib and C. Narasimhan
376,513 (1999), 285
Bultez, A.V. andP.A. Naert (1975), 178, Chintagunta, P.K. (1992a), 243
376 Chintagunta, P.K. (1992b), 158
Bultez, A.V. and P.A. Naert (1979), 93, Chintagunta, P.K. (1993a), 103, 158,
119 541
Bultez, A.V. and P.A. Naert (1988a), Chintagunta, P.K. (1993b), 149
93 Chintagunta, P.K. (1998), 284
Bultez, A.V. and P.A. Naert (1988b), Chintagunta, P.K. (1999), 103
262-264,290 Chintagunta, P.K., D.C. Jain and N.J.
Bunn, D. and G. Wright (1991), 421 Vilcassim (1991), 158,
Burke, R.R. (1991), 306 Chintagunta, P.K. and V.R. Rao ( 1996),
Burke, R.R., A. Rangaswamy, Y. Wind 216,219 163,241,272
and J. Eliashberg (1990), 306 Chintagunta, P.K. and N.J. Vilcassim
Buzzell, R.D. (1964), 80 (1994), 216,219
Buzzell, R.D. and B.T. Gale (1987), Chow, G.C. (1960), 365
382 Chow, G.C. (1983), 495
Christen, M., Sachin Gupta, J.C. Porter,
R. Staelin and D.R. Wittink (1997),
Capon, N. and J. Hulbert (1975), 126 135, 158, 170, 253, 273, 274, 277,
Carman, J.M. (1966), 240 505, 556, 565
Carpenter, G.S. (1987), 382 Chu, W. and P.S. Desai (1995), 219
Carpenter,G.S.,L.G. Cooper,D.M. Hanssens Churchill, G.A. (1995), 534
andD.F.Midgley(l988),209,288, Churchman, C.W. and A. Schainblatt
290,297,375 (1965),528,531
Carpenter, G.S. and D.R. Lehmann ( 1985), Clark, B. H. and D.B. Montgomery ( 1999),
241 283
Cats-Baril, W.L. and G.P. Huber (1987), Clarke, D. (1983), 23
544 Clarke, D.G. (1973), 375
Chakravarti, D., A. Mitchell and R. Staelin Clarke, D.G. (1976), 89, 279
(1981), 22,420 Clements, K.W. and E.A. Selvanathan
Channon, C. (1985), 560 (1988), 167
Channon, C. (1987), 560 Cleveland, W.S. (1979), 474
AUTHOR INDEX 621
Cochrane, D. and G.H. Orcutt (1949), Davidson, R. and J.G. MacKinnon (1981 ),
373 494
Colombo, R.A. and D.G. Morisson ( 1989), Davidson, R. and J.G. MacKinnon (1993),
249 323
Conamor, W.S. and T.A. Wilson (1974), Day, G.S., A.D. Shocker and R.K. Sri-
75 vastava (1979), 283, 286
Cooil, B. and J.M. Devinney (1992), Day, G.S. en R. Wensley (1988), 201
15 Dayton, M.C. and G.B. MacReady ( 1988),
Cooper, L.G. (1993), 174 456
Cooper, L.G., D. Klapper and A. Inoue De Finetti, B. (1964), 423
(1996), 175 De Finetti, B. (1965), 428
Cooper,L.G. andM. Nakanishi (1988), Deal, K.R. (1975), 216
172-174,176,209,314,533 Dearden, J.A., G.L. Lilien and E. Yoon
Corstjens, M. and P. Doyle ( 1981 ), 262, (1999), 541
264 Debreu, G. (1960), 175
Corstjens, M and D. Weinstein ( 1982), Deighton, J., C.M. Henderson and S.A.
541 Neslin (1994), 315
Cotterill, R. W., W.P. Putsis and R. Dhar Dekimpe, M.G., P. Fran~ois, S. Gopalakr-
(1998), 209 ishna, G.L. Lilien and C. Vanden
Cournot, A., 217 Bulte (1997), 164
Cowling, K. and J. Cubbin (1971), 76, Dekimpe, M.G. and D.M. Hanssens (1995a),
381 459,464,469
Cox, D.E. and R.E. Good (1967), 306 Dekimpe, M.G. andD.M. Hanssens (1995b),
Cox, D.R. (1975), 230 462,464,465,469
Cox, Jr, E. (1967), 131 Dekimpe, M.G. and D.M. Hanssens (1999),
Cready, W.M. (1991), 259 566
Crow, L.E., R.W. Olshavsky and J.O. Dekimpe, M.G., D.M. Hanssens and
Summers (1980), 126 J.M. Silva-Risso (1999), 566
Curhan, R.C. (1972), 262 DeSarbo, W.S. and W.L. Cron (1988),
455,456
Currim, I.S. (1982), 243
Deshpande, R. and H. Gatignon ( 1994),
Currim, I.S., C.B. Weinberg and D.R.
211
Wittink ( 1981 ), 106
Dhar, S.K., D.G. Morrison and J.S. Raju
Cyert, R.M. and J.G. March (1963),
(1996), 151
126
Dhrymes, P.J. (1981), 87, 91
Diamantopoulos, A. ( 1994), 441, 448
Daganzo, C. (1979), 243 Dillon, W.R. and S. Gupta (1996), 226,
Dalkey, N.C. and 0. Helmer (1962), 248,249
432 Dockner, E. and S. Jmgenson (1988),
Damme, E. van (1987), 215 216
Danaher,P.J. (1994), 131 Doktor, R.H. and W.F. Hamilton ( 1973 ),
Danaher, P.J. and R.J. Brodie (1992), 528
135 Dorfman, R. and P.O. Steiner (1954),
Danaher, P.J. and T. Sharot (1994), 315 146, 155
622 AUTHOR INDEX
Foekens, E.W., P.S.H. Lee:flang and D.R. Geweke, J., R. Meese and W. Dent (1983),
Wittink(1997), 131,175,209,254, 497
287,295-297,320 Geyskens, I., J.B.E.M. Steenkamp and
Foekens, E.W., P.S.H. Lee:flang and D.R. N. Kumar (1999), 567
Wittink (1999), 153, 170,473,478, Ghosh, A., S.A. Neslin and R. Shoe-
491 maker (1984), 119, 282, 285
Fomell, C. and D.F. Larcker (1981), Gijsbrechts, E. and P.A. Naert (1984),
448 45,256,257
Fomell, C. and R.T. Rust (1989), 494 Givon, M. and D. Horsky (1990), 244
Forrester, J.W. (1961), 26,30 Golany, B., M. Kress and F.Y. Phillips
Forsythe, A.B. (1972), 345 (1986), 223
Fourt, L.A. and J.W. Woodlock (1960), Goldberger, A.S. (1998), 56, 323
184 Goldberger, A.S. and C.F. Manski (1995),
Frankel, M.R. and L.R. Frankel ( 1977), 352
311 Goldfeld, S.M. and R.E. Quandt ( 1965),
Franses, Ph.H. (1991), 166,468 336
Franses, Ph.H. (1994), 469 Goldfeld, S.M. and R.E. Quandt ( 1972),
Franses, Ph.H. (1996), 458 79,384
Franses, Ph.H. (1998), 458 Goldfeld, S.M. and R.E. Quandt (1976),
Franses, Ph.H., T. Kloek and A. Lucas 79
(1999), 566 Goniil, F. and M.Z. Shi (1998), 552
Frenk, J.B.G. and S. Zhang ( 1997), 226 Goniil, F. and K. Srinivasan (1993a),
Friedman, J.W. (1991), 215 158,291
Friedman, L. (1958), 152,215 Goniil, F. and K. Srinivasan ( 1993b),
Friedman, M. (1953), 110 226,229
Friedman, R. (1982), 358 Goodhardt, G.J., A. S.C. Ehrenberg and
Fruchter, G.E. and S. Kalish (1997), C. Chatfield (1984), 234, 246, 249
216 Gopalakrishna, S. and R. Chattetjee ( 1992),
150,387,388
Gasmi, F., J.J. Laffont and Q. Vuong Granger, C.W.J. (1969), 495
(1992),217-219,383 Grayson, C.J. (1967), 426
Gatignon, H. (1984), 205,253 Green, P.E., J.D. Carroll and W.S. De-
Gatignon, H. (1993), 75 Sarbo (1978), 352
Gatignon, H., E. Anderson and K. Helsen Green, P.E. and V. Srinivasan (1990),
(1989), 201 573
Gatignon,H. andD.M. Hanssens(1987), Green, P.E., D.S. Tull and G. Albaum
474 (1988), 310
Gatignon, H., T.S. Robertson and A.J. Greene, W.H. (1997), 56, 323, 335, 340,
Fein (1997), 201 354,380,381,383,385 ,387,430,
Gaver, K.M., D. Horsky and C. Narasimhan 469,490,491
(1988), 376 Griffin, A. (1992), 540
Gensch, D.H., N. Aversa and S.P. Moore Griffin, A. and J.R. Hauser(1993), 541
(1990), 560 Griliches, Z. (1967), 87, 96
624 AUTHOR INDEX
Grover, R. and V. Srinivasan (1992), Hanssens, D.M., L.J. Parsons and R.L.
249 Schultz (1990), 3, 31, 32, 50, 68,
Grunfe1d, Y. and Z. Griliches (1960), 85,87,98,105,149,151,195,208,
277 215,280,310,315,376,382,458,
Guadagni, P.M. and J.D.C. Little ( 1983), 468,485,527,529,530,532
158,159,162,241,566 Harary, F. and B. Lipstein (1962), 238
Gujarati, D.N. (1995), 323 Hiirdle, W. (1990), 398,400
Gupta, S. (1988), 103, 246, 254, 541, Hiirdle, W. and 0. Linton (1994), 397
566 Hartie, T.J. and R.J. Tibshirani (1990),
Gupta, S. (1991), 226-229, 389, 391 474
Gupta, S. and L. G. Cooper ( 1992), 404 Hartung, P.H. and J.L. Fisher (1965),
Gupta, S. and R. Lou1ou (1998), 219 244
Gupta, SachinandP.K. Chintagunta(1994), Hashemzadeh, N. and P. Taylor ( 1988),
158 495
Gupta, Sachin, P.K. Chintagunta, A. Kaul Haugh, L.D. (1976), 497
and D.R. Wittink (1996), 10, 158, Hauser, J.R. and D. Clausing (1988),
160,170,270-272,566 540
Gupta, Sunil. (1994),421,431, 533 Hauser, J.R., D.l. Simester and B. Wern-
Gupta, S.K. and K.S. Krishnan ( 1967a), erfelt (1996), 541
215 Hauser, J.R. and B. Wernerfelt (1990),
Gupta, S.K. andK.S. Krishnan(1967b), 284,286
215 Hausman, W.H. and D.B. Montgomery
Gupta, U.G. and R.E. Clarke (1996), (1997), 539, 570
433 Haynes, B. and J.T. Rothe (1974), 549
Heeler, R.M. and M.L. Ray ( 1972), 4 79
Haaijer, R., M. Wedel, M. Vriens and Heerde, H.J. van ( 1999), 566
T.J. Wansbeek(l998), 163,243 Heerde, H.J. van, P.S.H. Leeflang and
Hagerty, M.R. and V. Srinivasan ( 1991 ), D.R. Wittink (1997), 508,511
502 Heerde, H.J. van, P.S.H. Leeflang and
Hahn, M. and J.S. Hyun (1991), 152 D.R. Wittink ( 1999a), 170,403, 405,
Hair, J.E., R.E. Anderson, R.L. Tatham 407,510,511,568
and W.C. Black (1995), 441 Heerde, H.J. van, P.S.H. Leeflang and
Hamilton, J.D. (1994), 323 D.R. Wittink (1999b), 103, 254, 536,
Hammond, J.S. (1974), 530 568
Hampton, J.M., P.G. Moore and H. Thomas Heerde, H.J. van, P.S.H. Leeflang and
(1973), 424 D.R. Wittink (1999c), 85, 94, 96,
Hanson, W. and R.K. Martin (1990), 99,170,475,478
259 Helmer, 0. (1966), 432
Hanssens, D.M. (1980a), 497 Helsen, K. and D.C. Schmittlein (1993),
Hanssens, D.M. (1980b), 6, 205,497 226,228-230
Hanssens, D.M. and L.J. Parsons (1993 ), Hendry, D.F. (1989), 346
458 Herniter, J.D. (1971), 238
Hanssens, D.M., J.L. Parsons and R.L. Herniter,J.D. andR.A. Howard(1964),
Schultz (1990), 3 22
AUTHOR INDEX 625
Hess, J.E. (1968), 107 Jain, D.C. and N.J. Vilcassim (1991),
Hildreth, C. andJ.Y. Lu (1960), 373 226-229
Hinich, M.J. and P.P. Ta1war (1975), Jain, D.C., N.J. Vilcassim and P.K. Chin-
345 tagunta (1994), 272
Hoch, S.J., B. Kim, A.L. Montgomery Jain, S.C. (1993), 432, 436
and P.E. Rossi (1995), 256, 284, Jamieson, L. and F.M. Bass ( 1989), 416
565 Jedidi, K., C.F. Mela and S. Gupta ( 1999),
Hocking, R.R. (1976), 491 464,566
Hoekstra, J.C. (1987), 38 Jeuland, A.P., F.M. Bass and G.P. Wright
Hoekstra, J.C., P.S.H. Leeflang and D.R. (1980), 246
Wittink (1999), 7, 539, 570, 571, Johansson, J.K. (1973), 82
574 Johansson, J.K. (1979), 82
Hoerl, A.E. and R.W. Kennard (1970), Johnson, E.J. and J. Payne (1985), 527
360 Johnson, E.J. and J.E. Russo (1994),
Hofstede, F. ter, J.B.E.M. Steenkamp 211
and M. Wedel (1999), 9 Johnston, J. (1984), 323
Hogarth, R. (1987), 421 Jones, J. (1986), 86
Hooley, G.J. and J. Saunders (1993), Jones, J.M. (1973), 159,238
286 Jones, J.M. and J.T. Landwehr (1988),
Horowitz, I. (1970), 237 158,244
Horsky, D. (1977a), 79 Joreskog, K.G. (1973), 441
Horsky, D. (1977b), 244 Joreskog, K.G. (1978), 441
Horsky, D. andL.S. Simon (1983), 108 Joreskog, K.G. and D. Sorbom (1989),
Houston, F.S. (1977), 382 43
Houston, F.S. and D.L. Weiss (1974), Judge, G.G., W.E. Griffiths, R.C. Hill,
97 H. Liitkepohl and T.C. Lee (1985),
56,60,62, 79,87,93,94,323,336,
Howard, J.A. and W.M. Morgenroth (1968),
338,340,342,343,345,358,370-
14, 124, 125
372,381,387,473,495,500
Howard, J.A. and J.N. Sheth (1969),
Juhl, H.J. and K. Kristensen (1989),
43
260
Howard, R.A. (1963), 238
Huber, P.J. (1973), 345
Kadiyali, V. (1996), 219, 383
Hughes, G.D. (1973), 157
Kadiyali, V., N.J. Vilcassim and P.K.
Hulbert, J.M. and M.E. Toy (1977), 28,
Chintagunta (1996), 259
541
Kadiyali, V., N.J. Vilcassim and P.K.
Huse, E. (1980), 544 Chintagunta (1999), 109, 208
Huysmans, J.H. (1970a), 528 Kahn, B.E. (1998), 574
Huysmans, J.H. (1970b), 528 Kaicker, A. and W.O. Bearden (1995),
259
Intriligator, M.D., R.G. Bodkin and C. Kalra, A., S. Rajiv and K. Srinivasan
Hsiao (1996), 323 (1998), 201
Iversen, G.R. (1984), 430,435 Kalwani, M.U., R.J. Meyer and D.G.
Iyer, G. (1998), 9 Morrison (1994), 517
626 AUTHOR INDEX
Kalwani, M.U. and D.O. Monison (1977), Ketellapper, R.H. ( 1981 ), 449
242 Ketellapper, R.H. (1982), 449
Kalwani, M.U. and A.J. Silk (1982), Kim, B. (1995), 272
416 Kim, N., E. Bridges and R.K. Srivas-
Kalwani, M.U. and C.K. Yim (1992), tava (1999), 187
476 Kim, S.Y. and R. Staelin (1999), 205,
Kalyanam, K. and T.S. Shively (1998), 219,567
106,398 Kimball, G.E. (1957), 219
Kamakura, W.A. and S.K. Balasubra- King, W.R. (1967), 3
manian (1988), 187 Klein, L.R. (1962), 63
Kamakura, W.A., B. Kim and J. Lee Klein, L.R. and Lansing, J.B. (1955),
(1996), 243 164
Kamakura, W.A. and G.J. Russell (1989), Kmenta,J. (1971), 371,373,375
158,241 Koerts, J. and A.P.J. Abrahamse ( 1969),
Kamakura, W.A. and G.J. Russell (1993), 344,349,350
303
Korosi, G., L. Matyas and I.P. Szekely
Kamakura, W.A. and R.K. Srivastava
(1992), 345
(1984), 243
Kotler, Ph. (1971), 3, 4
Kamakura, W.A. and R.K. Srivastava
Kotler, Ph. ( 1997), 251
(1986), 243
Koyck, L.M. (1954), 89
Kamakura, W.A. and M. Wedel ( 1997),
Kreps, D.M. andR. Wilson (1982), 211
302
Kamakura, W.A., M. Wedel and J. Agrawal Krishna, A. (1991), 478
(1994),456 Krishna, A. (1992), 99, 151, 196
Kanetkar, V., C.B. Weinberg and D.L. Krishna, A. (1994), 151, 196
Weiss (1986), 95,281 Krishnamurthi, L. and S.P. Raj (1985),
Kanetkar, V., C.B. Weinberg and D.L. 75
Weiss (1992), 75 Krishnamurthi, L. and S.P. Raj (1988),
Kannan, P.K. and G.P. Wright (1991), 103,226
243 Krishnamurthi, L. and S.P. Raj ( 1991 ),
Kapteyn, A., S. van de Geer, H. van de 226,249
Stadt and T.J. Wansbeek{l997), 164 Krishnamurthi, L., S.P. Raj and R. Sel-
Kapteyn,A., T.J. WansbeekandJ. Buyze vam (1990), 276,277
(1980), 164 Krishnamurthi, L. and A. Rangaswamy
Karmarkar, K.S. (1996), 540,541 (1987), 360
Kass, G.V. (1976), 552 Krishnan, K.S. and S.K. Gupta (1967),
Kaul, A. and D.R. Wittink (1995), 25, 215
31, 75,109,347 Krishnan, T.V. and H. Soni (1997), 9,
Kealy, M.J., J.F. Dovidio and M.L. Rockel 219,567
(1988),437 Kristensen, K. (1984), 345
Kealy, M.J., M. Montgomery and J.F. Kuehn, A.A. (1961 ), 240
Dovidio (1990), 437 Kuehn, A.A. (1962), 240
Keller, K.L. (1993), 303 Kuehn, A.A. and M.J. Hamburger ( 1963),
Kenkel, J.L. (1974), 341 34
AUTHOR INDEX 627
Kumar, N., L.K. Scheer and J.B.E.M. Lee, T.C., G.G. Judge and A. Zellner
Steenkamp (1995a), 567 (1970), 244,411
Kumar, N., L.K. Scheer and J.B.E.M. Leeflang, P.S.H. (1974), 3, 4, 11, 43,
Steenkamp (1995b), 567 106,159,225,232,238,355,375,
Kumar, T.K. {1975), 358 411
Kumar, V (1994), 131, 135 Leeflang, P.S.H. (1975), 73
Leeflang, P.S.H. (1976), 40, 131, 165
LaForge, R.W. andD.W. Cravens (1985), Leeflang, P.S.H. (1977a), 26, 40, 131,
22 157, 165
LaForge, R.W., C.W. Lamb jr., D.W. Leeflang, P.S.H. (1977b), 121,315,375
Cravens and W.C. Moncriefiii (1989), Leeflang, P.S.H. (1995), 22
559 Leeflang, P.S.H., K.J. Alsem and J.C.
La1, R. and C. Narasimhan (1996), 219 Reuy1(1991),23,438
Lal, R. and R. Staelin ( 1986), 150 Leeflang, P.S.H. and A. Boonstra ( 1982),
Lal, R. and J.M. Villas-Boas (1998), 9 43,240
Lambin, J.J. (1969), 144, 146 Leeflang, P.S.H. and J.J. van Duyn (1982a),
Lambin, J.J. (1970), 155 267, 281, 361,366
Lambin, J.J. (1972a), 91, 164 Leeflang, P.S.H. andJ.J. vanDuyn(1982b),
Lambin, J.J. (1972b), 145, 195,435 267,281,361,363,365-367
Lambin,J.J.(1976),31, 149,166,382, Leeflang, P.S.H. and J. Koerts (1973),
485 102
Lambin, J.J., P.A. Naert and A.V Bul- Leeflang, P.S.H. and J. Koerts (1974),
tez (1975), 113, 114, 148, 149, 155, 244,315
157,170,204,382 Leeflang, P.S.H. and J. Koerts (1975),
Lancaster, K.M. (1984), 166, 382 106
Larreche, J.C. (1974), 107, 526 Leeflang, P.S.H., G.M. Mijatovic and
Larreche, J.C. (1975), 107,526 J. Saunders (1992), 86, 87, 91, 98,
Larreche, J.C. and R. Moinpour (1983), 488,490,491
433 Leeflang, P.S.H. and G.M. Mijatovic
Larreche, J.C. and D.B. Montgomery (1988), 97
(1977), 433, 525 Leeflang, P. S.H. and A.J. Olivier (1980),
Larreche, J.C. and V. Srinivasan (1981), 62,311
541 Leeflang, P.S.H. andA.J. 01ivier(1982),
Larreche, J.C. and V Srinivasan ( 1982), 62,311
541 Leeflang, P.S.H. and A.J. Olivier (1985),
Lawrence, M.J., R.H. Edmundson and 62, 311, 566
M.J. O'Connor (1986), 428,436 Leeflang, P.S.H. and F.W. Plat (1984),
Lawrence, R.J. (1975), 240 473
Leamer, E.E. (1978), 304 Leeflang, P.S.H. and F.W. Plat (1988),
Lee, A.M. (1962), 152 62,311,474
Lee, A.M. (1963), 152 Leeflang, P.S.H. and J.C. Reuy1 ( 1979),
Lee, A.M. and A.J. Burkart ( 1960), 152 109,376
Lee,M. (1996),403 Leeflang, P.S.H. andJ.C. Reuy1 (1983),
Lee, P.M. (1997), 430,435 376
628 AUTHOR INDEX
Leeflang, P.S.H. and J.C. Reuyl (l984a), Little, J.D.C. (1970), 5, 53, 81, 101,
112,113,119,283 104, 108, 120,417,420, 556
Leeflang,P.S.H. andJ.C. Reuyl (1984b), Little, J.D.C. (1975a), 5, 101, 108,420,
376 536
Leeflang, P.S.H. and J.C. Reuyl (1985a), Little, J.D.C. (1975b), 5, 27, 79, 108,
90, 166 420,436,533,534
Leeflang,P.S.H. andJ.C. Reuyl (1985b), Little, J.D.C. (1979), 66
155,204,513 Little, J.D.C. (1998), 314,316
Leeflang, P. S.H. and J .C. Reuy1 ( 1986), Little, J.D.C. and L.M. Lodish ( 1969),
268 80, 152
Leeflang, P.S.H. and J.C. Reuyl ( 1995), Little, J.D.C. and L.M. Lodish (1981),
166,409 420
Leeflang, P.S.H. and M. Wedel ( 1993), Little, J.D.C., L.M. Lodish, J.R. Hauser
106 and G.L. Urban (1994), 557
Leeflang, P.S.H. and D.R. Wittink: ( 1992), Lock, A. (1987), 432
201,206-209,497 Lodish, L.M. ( 1971 ), 22, 150, 153,420,
Leeflang, P.S.H. and D.R. Wittink: (1994), 559
201,286 Lodish, L.M. (1981), 536
Leeflang, P.S.H. andD.R. Wittink (1996), Lodish, L.M. (1982), 536
201,209-211,214,215,286 Lodish, L.M., M.M. Abraham, S. Kalmen-
Leeflang, P.S.H. and D.R. Wittink (2000), son, J. Livelsberger, B. Lubetkin,
211 B. Richardson and M.E. Stevens ( 1995a),
Lehmann, E.L. (1983), 416 31,315
Lodish, L.M., M.M. Abraham, J. Livels-
Leigh, T.W. and A.J. Rethans (1984),
berger, B. Lubetkin, B. Richardson
127
and M.E. Stevens (1995b), 31, 90,
Leone, R.P. (1983), 468
315,485
Leone, R.P. (1995), 31, 90,279,280
Lodish, L.M., E. Curtis, M. Ness and
Leone, R.P. and R.L. Schultz (1980),
M.K. Simpson (1988), 150,559
31,166,485
Lodish, L.M., D.B. Montgomery and
Lilien, G.L. (1974a), 240 F.E. Webster (1968), 80
Lilien, G.L. (1974b), 240 Logman,M. (1995), 75
Lilien, G.L. (1975), 11 Long, S. (1983), 441
Lilien, G.L. (1979), 126 Louviere, J.J. and D.A. Hensher ( 1983 ),
Lilien, G.L. (1994), 540 241
Lilien, G.L. and P. Kotler (1983), 3, Louviere, J.J. and G. Woodworth (1983),
86,416,417,424 241,303
Lilien, G.L., P. Kotler and K.S. Moor- Lucas, H.C., M.J. Ginzberg and R.L.
thy(1992),3,15,25,38,126,157- Schultz (1990), 527
159,382,549 Lucas, R.E. (1976), 558
Lilien, G.L. and A. Rangaswamy(1998), Luce, R.D. (1959), 175,241
11,184-186,193,306,415,433 Luce, R.D. and H. Raiffa (1957), 215
Lindsey, J.K. (1996), 393, 394 Luik, J.C. and M.J. Waterson (1996),
Little, J.D.C. (1966), 108 166,562
AUTHOR INDEX 629
Mitchell, A.A., J.E. Russo and D.R. Naert, P.A. and P.S.H. Leeflang (1978),
Wittink (1991), 305,307, 546 3
Mizon, G.E. (1995), 341 Naert, P.A. and M. Weverbergh ( 1977),
Monroe, K.B. and A.J. Della Bitta ( 1978), 384,385
258,259 Naert, P.A. and M. Weverbergh (1981a),
Montgomery, A.L. (1997), 567 119
Montgomery, D.B. (1969), 238 Naert, P.A. and M. Weverbergh (1981 b),
Montgomery, D.B. (1973), 4 420,434
Montgomery, D.B. and A.J. Silk ( 1972), Naert, P.A. and M. Weverbergh (1985),
91, 106 119,282
Montgomery, D.B., A.J. Silk and C.E. Naik, P.A., M.K. Mantrala and A.G.
Zaragoza (1971), 153,533 Sawyer (1998), 152
Montgomery, D.B. and G.L. Urban (1969), Nakanishi, M. (1972), 172
3,4, 15, 77,104,232,252 ,306
Nakanishi,M. andL.G. Cooper{l974),
Montgomery, D.B. and G.L. Urban (1970),
172,376
306
Nakanishi, M. andL.G. Cooper(1982),
Moore, H.L. (1914), 75
172, 177
Moore, W.L. and D.R. Lehmann (1989),
Narasimhan, C. (1988), 151
243
Moorthy, K.S. (1984), 259 Narasimhan, C. and S.K. Sen (1983),
Moorthy, K.S. (1985), 215,216 189
Moorthy, K.S. (1988), 219 Nash, J. (1950), 216
Moorthy, K.S. (1993), 215,216 Neider, J.A. and R.W.M. Wedderburn
Moriarty, M. (1975), 281 (1972), 454
Morikawa, T. (1989), 303 Nenning, M., E. Topritzhofer and U.
Morrison, D.G. (1966), 159,238 Wagner ( 1979), 311
Morrison, D.G. (1979), 416 Nerlove, M. (1971), 95, 364,365
Morrison, D.G. and D.C. Schmittlein Neslin, S.A., S.G. Powell and L.G. Schnei-
(1988),223,224, 226 der Stone (1995), 152
Morwitz, V.G. and D.C. Schmittlein (1992), Neslin, S.A. and L.G. Schneider Stone
416,437 (1996), 96, 158,475
Neslin, S.A. andR.W. Shoemaker(1983),
Nadaraya, E.A. (1964), 398 151, 195
Naert, P.A. (1972), I 06 Newcomb, S. (1886), 451
Naert, P.A. (1973), 119, 144, 146 Newell, A. andH.A. Simon (1972), 126
Naert,P.A. (1974), 110,117 Nielsen Marketing Research ( 1988), 312
Naert, P.A. (1975a), 515 Nillesen, J.P.H. (1992), 43
Naert, P.A. (1975b), 515,517 Nijkamp, W.G. (1993), 42, 182, 187,
Naert, P.A. (1977), 25 189, 190
Naert, P.A. andA.V. Bultez (1973), 79, Nooteboom, B. (1989), 76, 187
109, Ill, 112 Nordin, J.A. (1943), 153
Naert,P.A. andA.V. Bultez(1975),43 , Novak, T.P. (1993), 284
244,387 NZDH (1989), 408
AUTHOR INDEX 631
Rangaswamy, A., B.A. Harlam and L.M. Rust, R.T., D. Simester, R.J. Brodie and
Lodish (1991), 307 V. Nilikant (1995), 493
Rangaswamy, A. and L. Krishnamurthi
(1991), 360 Savage, L.J. (1954), 423
Rangaswamy, A. and L. Krishnamurthi Scales, L.E. (1985), 392
(1995), 360 Scheer van der, H.R. ( 1998), 552
Rangaswamy, A., P. Sinha and A.A. Scheer van der, H.R. and P.S.H. Leef-
Zoltners (1990), 22, 150, 152 lang (1997), 103
Rao, V.R. (1984), 259 Schlaifer, R. (1969), 435
Rao, V.R. (1993), 45, 151,259 Schmalensee, R.L. (1972), 15, 112
Rao, V.R. and E. W. McLaughlin ( 1989), Schmittlein, D.C., L.C. Cooper and D.G.
261 Morrison (1993), 224
Rao, V.R. andL.J. Thomas (1973), 151 Schmittlein, D.C., D.G. Morrison and
R. Colombo (1987), 226
Rao, V.R., Y. Wind and W.S. DeSarbo
(1988), 359 Schmitz, J.D., G.D. Armstrong and J.D.C.
Little (1990), 307
Reddy, S.K., J.E. Aronson and A. Starn
Schoemaker, P.J.H. (1995), 436
(1998), 152
Schultz, H. (1938), 75
Reibstein, D .J. and H. Gatignon ( 1984),
Schultz, R.L. (1971), 203, 382, 383
259
Schultz, R.L., M.J. Ginzberg and H. C.
Reinmuth, J.E. and D .R. Wittink ( 1974 ),
Lucas(1984),527,532
474
Schultz, R.L. and M.D. Henry (1981),
Reuyl, J.C. (1982), 109,268,269,376
525,531,534,537
Roberts, J.H. and J.M. Lattin (1991),
Schultz, R.L. and D.P. Slevin (1975),
284
525
Roberts, J.H. and G.L. Lilien (1993), Schultz, R.L. and D.P. Slevin (1977),
159,198,242 537
Robey, D. (1984), 532 Schultz, R.L. and D.P. Slevin (1983),
Robinson, W.T. (1988), 201,403 527,532
Rogers, R. (1962), 188 Schultz, R.L. and D.R. Wittink (1976),
Rossi, P.E. and G.M. Allenby (1994), 40
158 Schultz, R.L. and A.A. Zoltners ( 1981 ),
Rossi, P.E., R.E. McCulloch and G.N. 3
Allenby (1996), 163 Schwartz, G. (1978), 396,493
Roy, A., D.M. Hanssens and J.S. Raju Sethi, S.P. (1977), 151
(1994), 217 Sethuraman, R., V. Srinivasan and D.
Russell, G.J. (1988), 281 Kim (1999), 175
Russell, G .J. and W.A. Kamakura ( 1994), Shakun, M.F. (1966), 215
278,566 Shankar, V. (1997), 201
Rust, R.T. (1988), 400-402 Shapiro, S.S. and M.B. Wilk (1965),
Rust, R.T., C. Lee and E. Valente, jr. 344
(1995), 491 Sharma, S. (1996), 43, 198, 360, 441,
Rust, R. T. and D.C. Schmittlein ( 1985), 445,448
494 Shepard, D. (1990), 552
AUTHOR INDEX 633
Shocker, A.D., M. Ben-Akiva, B. Boc- Srivastava, R.K., M.l. Alpert and A.D.
cara and P. Nedungadi (1991), 285 Shocker(l984),286
Shocker, A.D. and W.G. Hall (1986), Steenkamp, J.B.E.M. ( 1989), 315
189 Steenkamp, J.B.E.M. and H. Baumgart-
Shocker, A.D., D.W. Stewart and A.J. ner (1998), 449
Zahorik (1990), 562 Steenkamp, J.B.E.M. and M.G. Dekimpe
Shoemaker, R. W. and L.G. Pringle ( 1980), (1997). 175
62,311 Steenkamp, J.B.E.M. and H.C.M. van
Shugan, S.M. (1987), 286 Trijp (1991), 450
Sichel, H.S. (1982), 224 Stewart, J. (1991), 56, 93, 344
Siddarth, S., R.E. Bucklin and D.G. Mor- Styan, G.P.H. and H. Smithjr. (1964),
rison (1995), 285,291 238
Sikkel, D. and A. W. Hoogendoorn ( 1995), Sunde, L. and R.J. Brodie (1993), 303
224-226 Swamy, P.A.V.B. (1970), 363
Silk, A.J. and G.L. Urban (1978), 192, Swamy, P.A.V.B. (1971), 363
193 Swanson, E.B. (1974), 527
Silverman, B. W. ( 1986), 400
Simon, C.J. andM.W. Sullivan (1993), Ta1war, P. (1974), 345
303 Taylor, C.J. (1963), 152
Simon, H. (1984), 556 Tellis, G.J. (1988a), 226
Simon, H. (1994), 556, 557, 559 Tellis, G.J. (1988b), 31, 110,435,485
Simon, L.S. and M. Freimer (1970), 3, Tellis, G .J. and C.M. Crawford ( 1981 ),
232 131
Sims, C.A. (1972), 496 Tellis, G.J. and C. Farnell (1988), 382
Sims, C.A. (1980), 558 Tellis, G.J. and F.S. Zufryden (1995),
Sinha, R.K. and M. Chandrashekaran 153, 158
(1992), 231 Telser, L.G. (1962a), 238,244
Sirohi, N. (1999), 259 Telser, L.G. (1962b), 237,244,315
Sirohi, N., E.W. McLaughlin andD.R. Telser, L.G. (1963), 237,411
Wittink (1998), 9 Teng, J. T. and G.L. Thompson ( 1983),
Skiera, B. and S. Albers (1998), 153 216
Smee, C., M. Parsonage, R. Anderson Theil, H. (1965), 507
and S. Duckworth (1992), 408,409 Theil, H. (1969), 160
Smith, L.H. (1967), 424 Theil, H. (1971), 63, 108, 335, 340,
Smith, S.A., S.H. Mcintyre and D.D. 349,350,369
Achabal (1994), 474 Theil, H. (1975), 167
Smith, S.V., R.H. Brien and J.E. Stafford Theil, H. (1976), 167
(1968), 306 Theil, H. and A. Schweitzer ( 1961 ),
Solow, R.M. (1960), 92 373
Spring, P., P.S.H. Leeflang and T.J. Wans- Thomas, J.J. and K.F. Wallis (1971),
beek (1999), 552 343
Srinivasan, V. (1976), 152 Thursby, J.G. and P. Schmidt (1977),
Srinivasan, V. and H.A. Weir (1988), 333
79,281 Tinbergen, J. (1966), 10
634 AUTHOR INDEX
Titterington, D.M., A.F.M. Smith and Vidale, H.L. and H.B. Wolfe (1957),
U.E. Makov (1985), 452 219
Todd, P. and I. Benbasat (1994), 531, Vilcassim, N.J. (1989), 167
546 Vilcassim, N.J. and D.C. Jain (1991),
Torgerson, W. (1959), 61 228-230,238
Tsay, R.S. and G.C. Tiao (1984), 462 Vilcassim, N.J., V. Kadiyali and P.K.
Tversky, A. (1972), 285 Chintagunta (1999), 208, 219
Vuong, Q.H. (1989), 494
Uncles, M., A.S.C. Ehrenberg and K. Vyas, N. and A.G. Woodside (1984),
Hammond (1995), 29,234,235 126
Urban, G.L. (1968), 189
Urban, G.L. (1969a), 42, 189
Urban, G.L. (1969b), 259 Waarts, E., M. Carree and B. Wierenga
Urban, G.L. (1970), 42, 189 (1991), 286
Urban, G.L. (1971), 152 Wallace, T.D. (1972), 362
Urban, G.L. (1972), 534 Walters, R.G. (1991), 254, 255
Urban, G.L. (1974), 26, 51,534,537 Wansbeek, T. W. and M. Wedel ( 1999),
Urban, G.L. (1993), 130, 188, 189, 192, 6
193 Watson, G.S. (1964), 398
Urban, G.L. and J.R. Hauser (1980), Webster, F.E. (1992), 539
241,242,516 Webster, F.E. (1994), 539, 574
Urban, G.L. and J.R. Hauser (1993), Wedel, M. and W.S. DeSarbo (1994),
517 453
Urban, G.L., J.R. Hauser and J.H. Roberts Wedel, M. and W.S. DeSarbo (1995),
(1990), 188 452,453
Urban, G .L. and R. Karash ( 1971 ), 10 1, Wedel, M .. W.S. DeSarbo, J.R. Bult
105,189,534,535 and V. Ramaswamy (1993), 224,
Urban, G.L. and M. Katz (1983), 192, 226,552
517 Wedel, M. and W.A. Kamakura ( 1998),
US DHHS (1989), 408 158,163,451,453,456,457
Wedel, M., W.A. Kamakura, N. Arora,
Vanden Abee1e, P. (1975), 449 A.C. Bemmaor, J. Chiang, T. El-
Vanden Abee1e, P. and E. Gijsbrechts rod, R. Johnson, P. Lenk, S.A. Nes-
(1991), 291 lin and C.S. Poulsen ( 1999), 162
Vanden Abeele, P., E. Gijsbrechts and Wedel, M., W.A. Kamakura, W.S. De-
M. Vanhuele (1990), 288, 290 Sarbo and F. Ter Hofstede (1995),
Vanden Bulte, C. and G. L. Lilien ( 1997), 226,229-231,238
186 Wedel, M. and P.S.H. Leeflang (1998),
Vander WerfP.A. and J.F. Mahan ( 1997), 106,557
32 Wedel, M., M. Vriens, T.H.A. Bijmolt,
Vanhonacker, W.R. (1988), 281 W. Krijnen and P.S.H. Leeflang (1998),
Verhoef, P.C., Franses, P.H. and J.C. 283
Hoekstra (2000), 573 Weerahandi, S. and S.R. Dalal (1992),
Verhulp, J. (1982), 31 437
AUTHOR INDEX 635
Weiss, P.L. and P.M. Windal (1980), Wittink, D.R. (1987), 135
96,490 Wittink, D.R. (1988), 56, 118,338,340,
Weverbergh, M. (1976), 112 350,352,354,358,481,502,506
Weverbergh, M. (1977), 25 Wittink, D.R., M.J. Addona, W.J. Hawkes
Weverbergh, M., Ph.A. Naert and A. and J.C. Porter (1988), 61, 78, 114,
Bultez ( 1977), 109 168,508,536
Wheat, R.D. and D.G. Morrisson (1990), Wittink, D.R. and S.K. Keil (2000), 573
224 Wittink, D.R., M. Vriens and W. Burhenne
White, H. (1980), 336 (1994), 573
Wierenga, B. (1974), 232, 233, 238, Wold, F.M. (1986), 31
240 Wonnacott, R.J. and T.H. Wonnacott
Wierenga, B. (1978), 240 (1970), 79, 354
Wierenga, B. and G.H. van Bruggen Wonnacott, T.H. and R.J. Wonnacott
(1997), 7,307,308,527 (1969), 354
Wierenga, B. and G.H. van Bruggen
(2000), 305, 307 Yi, Y. (1989), 198
Wilde, D.J. and Beightler, C.S. (1967), Yon, B. (1976), 87
387
Wildt, A.R. (1993), 360 Zaltman, G., C.R.A. Pinson and R. An-
Wildt,A.R. andR.S. Winer(l983),474 gelmar (1973), 4 79
Wind, Y. (1981), 537, 541 Zellner, A. (i962), 374, 375
Zellner, A. (1971), 435
Wind, Y. (1982), 541
Wind, Y., P.E. Green, D. Shifflet and Zellner, A. (1979), 495
M. Scarbrough (1989), 559 Zellner, A. and M.M. Geisel (1970),
Wind, Y. and G.L. Lilien (1993), 537- 97
539,541 Zenor, M.J. (1994), 259-261
Wind, Y., V. Mahajan and D.J. Swire Zender, A.P. and D. Ryde (1956), 152
(1983), 258 Zmud, R.W. (1979), 527,528
Winkler, R.L. (1967a), 424,427,429 Zoltners, A.A. (1976), 153
Winkler, R.L. (1967b), 427 Zoltners, A.A. (1981), 151
Winkler, R.L. (1967c), 427 Zo1tners, A.A. (1982), 537
Winkler, R.L. (1968), 428, 429 Zoltners,A.A. andP. Sinha(1983), 153
Winkler, R.L. (1986), 428 Zufryden, F.S. (1981), 244-246
Winkler, R.L. (1987), 432 Zufryden, F.S. (1982), 244
Wittink, D.R. (1977), 75,267,361,363, Zufryden, F.S. (1986), 244
368,375,475 Zwart, P.S. (1983), 38
Subject Index
637
638 SUBJECT INDEX