Bass Diffusion Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

TFS-17819; No of Pages 16

Technological Forecasting & Social Change xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Technological Forecasting & Social Change

Pre-launch new product demand forecasting using the Bass model:


A statistical and machine learning-based approach
Hakyeon Lee a, Sang Gook Kim b, Hyun-woo Park b, Pilsung Kang a,⁎
a
Department of Industrial and Systems Engineering, Seoul National University of Science and Technology, 172 Gongreung 2-dong, Nowon-gu, Seoul 139-746,
South Korea
b
Korea Institute of Science and Technology Information (KISTI), 66 Hoegi-ro, Dongdaemun-gu, Seoul 130-741, South Korea

a r t i c l e i n f o a b s t r a c t

Article history: This study proposes a novel approach to the pre-launch forecasting of new product demand
Received 23 April 2013 based on the Bass model and statistical and machine learning algorithms. The Bass model is
Received in revised form 12 August 2013 used to explain the diffusion process of products while statistical and machine learning
Accepted 17 August 2013 algorithms are employed to predict two Bass model parameters prior to launch. Initially, two
Available online xxxx
types of databases (DBs) are constructed: a product attribute DB and a product diffusion DB.
Taking the former as inputs and the latter as outputs, single prediction models are developed
Keywords: using six regression algorithms, on the basis of which an ensemble prediction model is
Pre-launch forecasting constructed in order to enhance predictive power. The experimental validation shows that
Bass model
most single prediction models outperform the conventional analogical method and that the
Multivariate linear regression
ensemble model improves prediction accuracy further. Based on the developed models, an
Machine learning
Ensemble illustrative example of 3D TV is provided.
© 2013 Elsevier Inc. All rights reserved.

1. Introduction Despite its importance, however, little progress has been


made in pre-launch forecasting in the literature. Product
Increasing market uncertainty is making forecasting new demand forecasting has mainly been examined in the context
product demand more difficult than ever, while shorter product of innovation diffusion. The diffusion of an innovation is “the
lifecycles are forcing managers to produce new product demand process by which an innovation is communicated through
forecasts more frequently. Correct forecasts are a key determi- certain channels over time among the members of a social
nant of the success of new products, but accurate forecasting can system” [3]. Since its introduction to marketing studies in the
be challenging. Forecasting demand for a mature-stage product 1960s, a variety of models have been developed to empiri-
is unproblematic since enough historical sales data are available. cally model the diffusion of innovations [4]. The main thread
However, forecasting the future sales of a new product that has of the diffusion models has been based on the pioneering work
a short history or no history at all is complicated. Indeed, as Bass of Bass [5]. The assumption underlying the Bass model is that an
et al. [1] note, “the most important forecast is the forecast prior innovation is spread through two types of communication
to launch,” as it drives important pre-launch decisions such as channels: mass media (external influence) and word-of-mouth
capital equipment purchases and capacity planning [2]. (internal influence). Over the past 40 years, the Bass model has
Overestimating demand can result in excess inventories, while thus enjoyed a number of applications because it has a relatively
underestimation may incur significant opportunity costs and high explanatory power despite its simple structure [6].
reduce market share. Pre-launch forecasting also has been investigated mainly
based on the Bass model. Previous studies of pre-launch
forecasting with the Bass model have struggled against how
⁎ Corresponding author. Tel.: +82 2 970 7279; fax: +82 2 974 2849. to estimate the model parameters of a new product when
E-mail address: [email protected] (P. Kang). little or no historical sales data are available. Several methods

0040-1625/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.techfore.2013.08.020

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
2 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

have been developed, which can be categorized into three algorithms, such as multilayer linear regression, support vector
types: Bayesian approaches, subjective approaches, and analog- regression, and Gaussian process regression, as well as an
ical approaches. The Bayesian approach starts with pre-launch ensemble model of these, are used in building the prediction
forecasts and updates them as additional data become available. models. Finally, their performances are evaluated using the
Various methods for updating the parameter estimates or the mean absolute error (MAE) and the root mean squared error
forecasts have been proposed [7–10]. However, this approach (RMSE).
centers on how to update forecasts after launch and still calls for Pre-launch product demand forecasting on the basis of
initial pre-launch forecasts to be updated as data become statistical and machine learning algorithms has several advan-
available. The subjective approach produces parameter esti- tages over conventional analogical methods. First, a reliable
mates through an algebraic estimation procedure on the basis relationship between the attributes and diffusion characteris-
of managerial judgments of tangible information such as the tics of existing products can be found, which in turn enables
time and level of peak sales [11] and the sum of the coefficients new product demand forecasts to be based solely upon these
of the external and internal influences [12]. The drawback of attributes without any human manipulation. In other words,
the subjective approach is that obtaining accurate judgments is analogous products are automatically selected and their
as difficult as estimating accurate parameters [13]. Finally, the contributions to forecasting systematically determined by
analogical approach, called “guessing by analogy,” has prevailed the prediction model; although such selection and determi-
in the literature. It assumes that a new product will have a nation processes are not easily understood by humans, they
diffusion pattern similar to those of its analogous products over are mathematically sound and analytically tractable. There-
time [1]. Under this approach, the parameter estimates of a fore, forecasting is no longer dependent on the subjective
new product are obtained by taking a weighted sum of the judgments of human experts, but becomes an objective outcome
parameters of analogous products, with the weights derived by obtained by the combination of learning algorithms and product
establishing similarities between the new product and several data. Second, since statistical and machine learning algorithms
analogous products [14]. are designed for interpolation as well as extrapolation, forecast-
Although the analogical approach has been applied widely ing accuracy can be improved. The parameter values predicted
[1,15–19], it has two main limitations. First, there are no clear by conventional analogical methods are bounded by the current
guidelines for how to select benchmarks even though the maximum and minimum estimates of reference products. If a
estimated parameters are highly dependent on the analogous new type of diffusion style occurred, whose parameter values
products under consideration. Second, the similarities are were far from the current boundary, conventional analogical
established by expert judgments that are naturally subjective methods would not properly reflect this eventuality. Statistical
in nature. A promising solution to these problems is using the and machine learning-based approaches, by contrast, would
historical empirical relationship between the parameters and scent the change from the inside of the product and digest it into
attributes of analogous products. Rogers [3] emphasizes that prediction. In light of the foregoing, this study proposes a new
the attributes of an innovation are important variables in approach to the pre-launch forecasting of new product demand,
explaining its rate of adoption. Once this relationship is which utilizes the Bass diffusion model and statistical and
identified, the parameters of a new product can be estimated machine learning-based regression algorithms. In addition, we
by knowing its characteristics [20]. Although identifying the also boost prediction accuracy by constructing an ensemble of
relationship between diffusion parameters and product attri- individual prediction models.
butes can serve a reliable basis for pre-launch forecasting of a The remainder of this paper is organized as follows. Section 2
new product, however, little research has been carried out thus reviews previous studies of pre-launch forecasting with the Bass
far in this direction and no systematic approach has yet been model. Section 3 demonstrates the proposed framework includ-
developed. ing the product and diffusion database (DB) design, single
The tenet of this study is that a statistical and machine prediction model development, and ensemble model construc-
learning-based approach can overcome the limitations of tion. Section 4 validates the single and ensemble prediction
the conventional analogical approach to pre-launch fore- models and provides an illustrative case study on the basis of the
casting. The goal of statistical and machine learning is to best single and ensemble models. Finally, the conclusion and
discover intrinsic, sometimes unanticipated, relationships limitations of this paper are presented alongside future research
between variables with the help of high computational directions in Section 5.
power [21,22]. A typical procedure of an inductive statis-
tical and machine learning approach is as follows. The first 2. Bass model and pre-launch forecasting
step is to set up the model structure by defining a learning task,
configuring input–output variables, seeking appropriate algo- The Bass model assumes that a technological innovation is
rithms, and selecting proper performance criteria. The second spread by two types of influences: externally by the mass
step is to collect sufficient real-world examples, which are then media and internally by word-of-mouth. It can be derived
divided into training and test data sets. In the third step, the from a hazard function that represents the probability that an
employed learning algorithms are optimized on the basis of the adoption occurs at time t given that it has not yet occurred:
training data set. Finally, the best model is identified on the
basis of the test data set, using the predetermined performance f ðt Þ
hðt Þ ¼ ¼ p þ qF ðt Þ ð1Þ
criteria. In this study, the learning task is defined as predicting 1− F ðt Þ
the parameters of the Bass model prior to launch, while input
and output variables are configured as product attributes where f(t) is the density function of time to adoption, F(t) is
and diffusion characteristics, respectively. Various regression the cumulative proportion of adopters at time t, and p and q

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 3

are the coefficients of external and internal influences, Rao and Yamada [8] expand the work of Lilien et al. [7] to show
respectively. Eq. (1) can thus be rewritten as that incorporating priors and the use of perceptual data can
improve forecasting performance. Lenk and Rao [9] further
dF ðt Þ suggest the Hierarchical Bayes procedure that explicitly
f ðt Þ ¼ ¼ ½p þ qF ðt Þ½1− F ðt Þ: ð2Þ
dt considers between-product and within-product variations
in establishing initial estimates for the new product. An adaptive
Because the number of adopters at time t, n(t), can be procedure combined with Bayesian updating, called the
obtained by multiplying f(t) by the potential market size, m, augmented Kalman filter with continuous state and discrete
rearranging Eq. (2) yields observations, is also proposed by Xie et al. [10]. While the
aforementioned approaches employ prior information de-
dN ðt Þ h q i
nðtÞ ¼ ¼ p þ Nðt Þ ½m−Nðt Þ: ð3Þ rived from the diffusion of previously introduced products
dt m
deemed most similar, Sultan et al. [27] develop initial
parameter estimates by conducting a meta-analysis of 213
The first term, p[m − N(t)], represents adoptions owing
sets of parameters from 15 articles. Lee et al. [28] utilize
to external influences, while the second term, mq N ðt Þ 
consumer reservation price data in order to construct prior
½m−Nðt Þ, stands for adoptions owing to internal influences.
distributions. Regardless of the vehicle used for obtaining initial
In a pure innovation scenario where only external influences
forecasts, however, time-varying estimation approaches main-
exist (i.e., q = 0), the Bass model reduces to an exponential
ly focus on updating after launch, which still requires initial
function. If p is zero, however, it is equivalent to a logistic
pre-launch forecasts to be updated as data become available.
model, assumed to be driven by only imitative processes,
Several approaches to producing pre-launch forecasts have
namely a pure imitation situation. Solving Eq. (2) produces
been proposed in previous studies and can be classified into
the following closed form solution:
two types: subjective approaches and analogical approaches. In
terms of subjective approaches, Mahajan and Sharma [11]
1−e−ðpþqÞt
FðtÞ ¼ q −ðpþqÞt : ð4Þ propose an algebraic estimation procedure that requires three
1þ e pieces of information: 1) market potential (m), 2) time of peak
p
sales (t*), and 3) level of peak sales (n*). Once these values have
The cumulative number of adopters at time t, N(t), can been obtained, p and q can be inferred. However, estimating
then be written as both the time and level of peak sales is another difficult task. In
addition, as Bass [29] notes, these are the key outputs intended
2 3
to be forecast from the observed data using the diffusion
6 1−e 7−ðpþqÞt
model; therefore, if one could guess such items, there would be
NðtÞ ¼ M6
4
7
q −ðpþqÞt 5: ð5Þ
1þ e no need to estimate the diffusion curve. A similar procedure
p suggested by Lawrence and Lawton [12] also involves obtaining
three pieces of managerial information: 1) market potential
The three parameters of the Bass model (m, p, and q) can (m), 2) number of adoptions in the first period (n(1)), and 3) an
usually be estimated through conventional estimation pro- estimate of the sum of the coefficients p and q (p + q). A similar
cedures such as ordinary least square (OLS) [5], maximum problem occurs here; estimating p + q is also difficult. Although
likelihood estimation (MLE) [23], and nonlinear least squares general guidelines suggest a value of 0.5 for consumer goods
(NLS) [24]. However, these methods can be applied only and 0.66 for commercial goods, such generalizations fail to
when enough sales data are available. Previous studies demon- mirror the idiosyncratic characteristics of particular prod-
strate that stable and robust parameter estimates for the Bass ucts. In fact, Lawrence and Lawton [12] suggest that prior
model can be obtained only if the data under consideration sales histories of similar products may produce better prior
include the peak of the noncumulative adoption curve [24,25]. parameter estimates.
When insufficient data are available, such as early in the product In this spirit, the analogical approach has been widely
life cycle or prior to launch, these conventional estimation employed for pre-launch forecasting. This approach assumes
methods cannot produce reliable parameter estimates. Never- that a new product will behave as analogous products do. As
theless, as discussed earlier, sales forecasting is even more crucial mentioned before, several previous studies using the Bayesian
when little or no data are available. Estimates of the diffusion approach utilize the diffusion information of exsiting similar
parameters early in the diffusion process or prior to launch are products as a priori [7–10]. A more systematic approach on the
extremely valuable for managerial decision-making such as basis of consumer choice theory is proposed by Thomas [14]. In
capital equipment purchases, production planning, and market- this approach, the parameters of a new product are estimated
ing strategy [26]. by taking a weighted sum of the parameters of analogous
Time-varying estimation procedures that rely on Bayesian products. These weights can be obtained by establishing the
updating have been introduced by several researchers to cope similarities between the new product and several analogous
with estimating when little or no data are available. These products in terms of five dimensions: environmental situation,
procedures start with forecasts prior to launch and update those market structure, buyer behavior, marketing mix strategy, and
forecasts whenever an additional record becomes available. innovation characteristics. This “guessing by analogy” ap-
Lilien et al. [7] propose a Bayesian approach whereby new proach has often been adopted for the pre-launch forecasting
product sales prior to market entry are predicted by considering of various types of products and services [1,16–19]. However,
the forecasts of a previously introduced similar product and there are no clear standards for the selection of benchmark
then updated once data are available using Bayesian regression. products. Analogous products are often chosen simply because

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
4 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

they are recent or easily recalled [30]. Although Ilonen et al. utilized for the case study. The performance comparison
[31] employ the self-organizing map for automatically identify- with the results of the simple analogical approach is also
ing analogies, their approach is centered on selecting analogous provided to uphold the validity of the proposed approach.
countries rather than analogous products. Bayus [15] suggests a Finally, the working of the developed models is provided
grouping procedure based on hierarchical clustering with factor with the help of a simple illustrative example of 3D TV.
analysis to generate priors on the basis of various products, but
the parameter estimates obtained in this way still highly depend
on what analogous products are selected. Moreover, similarities 3.2. Data
are usually identified by expert judgments or consumer opinions,
which are inherently subjective. As many products as possible for which previous sales data
Using a historical relationship between the parameters and are available should be collated in the DBs in order to improve
attributes of analogous products can be a promising solution to the reliability of the prediction models. One of the major data
this problem. Nonetheless, few studies have been conducted in sources was the CE Historical Data provided by the Customer
this direction. Sultan et al. [27] propose a meta-analysis model Electronics Association. This data set contains US sales data on
utilizing ANOVAs with four types of attributes. However, these more than 60 categories of electronic products from their
attributes are related to the characteristics of the research itself market introductions to the present. In total, 21 US products
such as model specification, estimation method, and data reuse, whose historical sales data were presented in recent papers
while product-related attributes only include type of innovation and reports were included in the DBs. In addition, Korean sales
and geographic effect. Similarly, Gatignon et al. [32] develop a data on over 40 products were gathered from relevant
cross-national econometric model of innovation diffusion, but associations. Since small numbers of observations may produce
its included variables are associated with country-level patterns unreliable estimates, all products with fewer than 12 sales
of social communication such as cosmopolitanism, mobility, and records were then excluded from the data set, which resulted
sex roles; again, no product-related attributes are considered. in 87 products remaining.
Finally, although Srivastava et al. [33] suggest a multi-attribute In the next step, the diffusion parameters of each of these 87
model for forecasting the adoption of investment alternatives products were estimated by fitting their historical data to the
that includes five attributes of investment products. Due to the Bass model in order to construct the product diffusion DB.
high correlation between these attributes, however, only two Although the Bass model is specified by three parameters, the
attributes, information costs and likelihood of loss of principal, potential market size (m), the coefficient of external influence
are employed to forecast the acceptance of a potential invest- (p), and the coefficient of internal influence (q), the product
ment alternative. Further, the attributes considered in this study diffusion DB only includes p and q. Most previous studies of
are too industry-specific to be generalized to other products and pre-launch forecasting center on estimating the two parameters
services. In summary, although establishing the historical of communication effects, p and q, while separately estimating
relationship between diffusion parameters and product attri- m from market research [1,9,10,14,15,18]. Similarly, Lawrence
butes can be considered to be promising for pre-launch new and Lawton [12] and Mahajan and Sharma [11] also utilize the
product forecasting, no systematic approach has thus far been potential market size as inputs for the subjective algebraic
proposed in the literature. procedures rather than as outputs of the procedure, because the
market size is likely to be affected by marketing efforts and
3. Methodology various environmental factors rather than by product charac-
teristics [34]. Consequently, our prediction model also only
3.1. Overall procedure includes estimates of p and q. These two parameters of the 87
products were estimated using the NLS procedure, with average
The present study proposes a statistical and machine learning values of p and q found to be 0.0087 and 0.3273, respectively.
approach to the pre-launch forecasting of new product demand The parameter estimates included in the product diffusion DB
on the basis of the Bass model. Fig. 1 depicts the overall were then employed as the target variables in the prediction
procedure of the proposed approach. Two types of DBs are models.
required to develop the prediction models: a product diffusion The next step was to construct the product attribute DB for
DB and a product attribute DB. The product diffusion DB includes the 87 products. Firstly, product attribute variables affecting
the Bass diffusion parameters of existing products obtained by the diffusion patterns need to be figured out. An extensive
applying the NLS technique to historical sales data on each literature review of the drivers and determinants of innovation
product. To construct the product attribute DB, various attributes diffusion and a series of discussions with senior marketing
that can explain the diffusion characteristics of a product need to managers in practice led us to derive 17 attribute variables for
be selected and defined. The values of the attributes of each constructing the product attribute DB. Table 1 presents these
product are measured through expert judgments. By taking the 17 variables with their abbreviations and measurement scales.
product attribute DB as inputs and the product diffusion DB as These variables can be grouped into four categories: industry,
targets, single prediction models are developed to identify the market, technology, and use. Four variables (IC, TCS, TIE, and
relationship between diffusion parameters and product attri- DN) were valued on nominal scales, while the other variables
butes. Six statistical and machine learning-based regression were measured on a five-point Likert scale from very low (1) to
algorithms are utilized. To improve prediction performance, very high (5). The values of the attribute variables for each of
ensemble prediction models combining these single models are the 87 products were then measured by industry experts
also constructed. Comparing prediction performance among the including marketing managers and engineers and used as input
developed models produces the best-fit models that will be variables in the prediction models.

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 5

Fig. 1. Overall procedure.

3.3. Experimental design using a stepwise linear regression, we selected only the
crucial variables before training the prediction models. This
3.3.1. Data preprocessing and variable selection stepwise selection process began with the single most relevant
As explained earlier, some products were removed owing input variable, and the following two procedures were then
to the limited number of historical sales records. Another conducted alternately until all significant variables had been
issue in the modeling process was the existence of outliers identified: (1) among the candidate variables, the one that
whose estimates were noticeably different. Fig. 2 shows the most improves accuracy is added to the selected variable set,
estimated Bass model parameters of the 87 products. This from which (2) the one that is most irrelevant to improve
figure clearly shows that certain products have significantly accuracy is removed.
large p or q values compared with others. Since such outliers
would degrade prediction accuracy, seven products whose
estimates were beyond two standard deviations from the 3.3.2. Regression algorithms
mean were also removed. Consequently, 80 products were Six statistical and machine learning-based regression algo-
finally used to build the prediction models, with the average rithms were employed to build single prediction models, namely
estimates of p and q decreasing to 0.0063 and 0.2783, multivariate linear regression (MLR), k-nearest neighbor regres-
respectively. The parameter estimates of each of these 80 sion (k-NN), artificial neural network (ANN), support vector
products are presented in Appendix A. regression (SVR), classification and regression tree (CART), and
Because most of the regression algorithms employed in Gaussian process regression (GPR).
this study can only handle numerical variables, the four MLR [35] fits the functional relationship between multiple
nominal variables (IC, TCS, TIE, and DN) were transformed input variables and the target variable of the given data in the
into binary variables using the 1-of-C coding method. In this form of a linear equation. Let yi denote the target value (p or
method, C binary dummy variables are created for a nominal q) of the ith product, while xij denotes the jth input variable of
variable with C categories; for each dummy variable, 1 is the ith product. Then, the MLR equation of d predictors with n
assigned if the original value falls in the same category, with training instances can be written as
0 otherwise (see Fig. 3). Once the variable transformation
had been completed, the number of variables increased from
17 to 24.
Some of these 24 candidate input variables were dispens-
able because they had little effect on the prediction. Thus, yi ¼ β0 þ β1 xi1 þ β2 xi2 þ ⋯ þ βd xid ; for i ¼ 1; 2; ⋯; n: ð6Þ

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
6 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

Table 1
Product attribute variables.

Dimension Variable Abbreviation Scale

Industry Industry classification IC Nominal (E: electrical, G: electronic, M: mechanical)


Types of competitive structure TCS Nominal (P: perfect competition, M: monopoly,
O: oligopoly, N: monopolistic competition)
Necessity of complementary goods NCG Five-point Likert
Availability of substitute goods ASG Five-point Likert
Market Level of price LP Five-point Likert
Types of income elasticity TIE Nominal (N: Necessities, L: Luxuries)
Number of potential customers NPC Five-point Likert
Number of suppliers NS Five-point Likert
Diversity of distribution channels DDC Five-point Likert
Technology Degree of newness DN Nominal (R: radical, I: incremental)
Rate of technological change RTC Five-point Likert
Easiness of imitation EI Five-point Likert
Degree of functional variety DFV Five-point Likert
Use Necessity of learning NL Five-point Likert
Frequency of use FU Five-point Likert
Duration of use DU Five-point Likert
Necessity of repurchase NR Five-point Likert

0 1 0 1 0 1
y1 1 x11 ⋯ x1d β1 k-NN [36] is the most popular case-based reasoning algo-
y ¼ Xβ; y¼ @ ⋮ A; X¼ @⋮ ⋮ ⋮ ⋮ A; β¼ @ ⋮ A:ð7Þ
rithm. Since it does not require a separate training procedure,
yn 1 xn1 ⋯ xnd βd
it has been employed in various domains where rapid and
frequent model updates are required [37–39]. k-NN predicts
a new instance on the basis of the similarity to its neighbors.
Once a test instance xt is given, k-NN first searches the k most
The regression coefficients β can be obtained by mini-
similar instances in the reference data set using a certain
mizing the squared residual error between the target (y) and
distance metric and allocates weights to them under the
b), as shown in Eq. (8), using the ordinary least
predictions (y
principle that the greater the similarity, the greater is the
squares method:
weight. Then, the prediction is made according to the weighted
average of the target values and assigned weights of the selected
neighbors as follows:
1X n
2 1 T   1
E¼ e ¼ y−b y y−y b ¼ ðy−XβÞT ðy−XβÞ:
2 i¼1 i 2 2 X
 −1 ð8Þ b
∂E T T T T
yt ¼ w jy j; ð9Þ
¼ X y−X Xβ ¼ 0; β ¼ X X X y: j∈NNðxt Þ
∂β

Fig. 2. The estimated Bass model parameters (x-axis: product index, y-axis: estimated value).

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 7

Fig. 3. An example of 1-of-C coding.

where NN(xt) and wj denote the index set of the k-nearest C in the Eq. (12) controls the trade-off between the flatness
neighbors of xt and the assigned weight to the jth nearest and the error of the training samples outside the ε-tube.
neighbor, respectively. In the k-NN regression, two user-specific Lagrangian formulation is derived by eliminating the con-
parameters must be declared prior to the prediction: the number straints using slack variables as follows:
of nearest neighbors (k) and weight allocation method.
ANN [21] is one of the most widely used nonparametric
Xn  X  

n
regression algorithms in many application domains owing 1 2 T
min kwk þ C ζi þ ζi − α i ε þ ζ i −yi þ w xi þ b
to its ability to capture nonlinear relationships between the w 2 i¼1 i¼1
input and output variables [40–42]. A three-layer feed-forward Xn X n   X n
 T 
neural network is used in our experiments. In ANN, the target is − ηi ζ i − α i ε þ ζ i þ yi −w xi −b − ηi ζ i ;
i¼1 i¼1 i¼1
expressed as a combination of input values, activation functions, s:t:
 
α i ; α i ; ζ i ; ζ i ≥0:
and weights as follows:
ð13Þ

!
X
h
ð2Þ
X
d
ð1Þ By taking the derivatives of the primal variables, the optimal
yi ¼ wq g wqp xip ; for i ¼ 1; 2; ⋯; n; ð10Þ
q¼1 p¼1 conditions for the above Lagrangian are obtained; in turn,
Wolfe's dual problem is derived by replacing the conditions in
the primal problem:

where w(2) (1)


q , wqp , and g(⋅) are the weight connected from the qth 1X n 
   T
Xn 

X
n  
max − α i −α i α i −α i xi x j −ε αi þ αi þ yi α i −α i
hidden node to the output node, the weight connected from the 2 i; j¼1 i¼1 i¼1

pth input node to the qth hidden node, and the activation X
n 
 
s:t: α i −α i ¼ 0; 0 ≤α i ; α i ≤1:
function, respectively. Training ANN is equivalent to finding the i¼1

optimal weights that minimize the objective loss function, as ð14Þ


shown in Eq. (11):

SVR enables nonlinear fitting using a mapping function ϕ(x)


1X n  2 that transforms data from a low dimensional input space to a high
Loss function L ¼ y −b
yi : ð11Þ dimensional feature space. Since only inner products between
2 i¼1 i
input vectors are required during optimization, a kernel trick K(xi,
xj) = ϕ(xi)·ϕ(xj) is employed to compute the inner product in
the feature space without an explicit mapping function.
Since there exists no explicit solution for ANN, combina- CART [47] is another approach to nonlinear regression in
torial optimization algorithms such as the gradient descent that the entire input space is recursively partitioned into
algorithm [43], Newton's method [44], and Levenberg– small chunks until the members can be fit by a simple model.
Marquardt algorithm [45] are used to optimize the weights. Beginning with all instances in a single parent node, the
Contrary to ANN, which is based on the empirical risk splitting criterion, which maximizes the information gain
minimization (ERM), SVR [46] is based on the structural risk usually measured by the decrease in the sum of squared
minimization (SRM) principle. SVR fits the regression errors between before and after the splitting, is determined.
equation b y ¼ wT b þ b with the constraints of including Then, the training instances are assigned to one of the child
training instances as many as possible in the ε -tube as nodes in accordance with their variable status and the
follows: splitting criterion. Again, each child node becomes a new
parent node and the same partitioning procedure is recur-
sively conducted until the full tree is constructed, i.e., there is
Xn  no more information gain after splitting.
1 2 
minw jjwjj þ C ζi þ ζi ; Another key procedure of training CART is pruning. Since
2
T
i¼1 ð12Þ the full tree constructed by CART divides the input space into
s:t yi −w xi −b ≤ ε þ ζ i ; ζ i ≥ 0; such a large number of tiny segments, it is usually exposed to
T  
w xi þ b−yi ≤ε þ ζ i ; ζ i ≥0: an over-fitting risk; it would degrade the generalization

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
8 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
 
ability if the model memorized all the characteristics of data, where A = σ12 XXT þ Σ−1 p . The prediction distribution of ft
some of which are even unnecessary, e.g., noises or outliers. at a test instance xt is then obtained by averaging the output
The role of pruning is to merge adjacent leaf nodes if the of all possible linear models with regard to the Gaussian
prediction accuracy is improved or at least unchanged on the posterior
validation data set that is not used in the tree construction.
Once pruning has been completed, CART predicts a new Z
instance by averaging the target values in the leaf node to pð f t j xt ; X; yÞ ¼ f ðxt jwÞP ðwjX; yÞdw ð20Þ
which the new instance belongs.  
1 T −1 T −1
GPR [48] begins with the Bayesian approach to MLR and ¼N x A Xy; xt A xt :
2 t
σ
extends the expressiveness by adopting kernel tricks. In GPR,
the target y is expressed as a linear combination of the inputs
with a Gaussian noise as follows:
As in SVR, GPR can fit a nonlinear relationship by
T
introducing a mapping function ϕ(x) to project the data
y ¼ f ðxÞ þ ε; f ðxÞ ¼ x w; ð15Þ from a low dimensional space to a higher dimensional
feature space and using kernel tricks to compute inner
products in the feature space without an explicit form of
assuming that the noise follows an independent, identically ϕ(x).
distributed (i.i.d.) Gaussian distribution with zero mean and
variance σ2, 3.3.3. Ensemble model
After training the single prediction models, ensemble
 
2
ε∼N 0; σ : ð16Þ prediction models [21,49] are constructed in order to enhance
the predictive power. Fig. 4 shows a general structure of an
ensemble model. A number of variations are possible according
The likelihood, which is the probability density of the to the diversity of algorithms or parameters; experts can consist
given data and parameters, can be directly obtained as of different learning algorithms, an identical learning algorithm
with different parameters, or a combination of the both. The
0  2 1 prediction outcome of an ensemble is formulated by aggregat-
n n
1 B yi −xTi w C ing the output of every expert. In our experiments, the best
pðyjX; wÞ ¼ ∏ pðyi jxi ; wÞ ¼ ∏ pffiffiffiffiffiffi exp@− A ensemble model was identified among all possible combinations
i¼1 i¼1 2π σ 2σ 2
!
of regression algorithms and subsequently used in a comparative
1  
1 T 2 validation analysis as well as in the case study.
¼  exp − j y−X
T
wj
2
¼ N X w; σ I
2 n=2 2σ 2
2πσ
ð17Þ 3.3.4. Validation method and performance measures
As a benchmark method to verify our proposed prediction
models, a conventional analogical prediction model was also
As a prior distribution over the weights, a zero mean constructed. In the analogical model, the two parameters of
Gaussian with covariance matrix Σp is generally used the Bass model can be predicted as follows:

  X
m X
n
w∼N 0; Σp : ð18Þ p¼ wi xij p j ð21Þ
j¼1 i¼1

Inference in GPR is based on the posterior distribution m X


X n

over the weights by applying the Bayes' theorem, q¼ wi xij q j ð22Þ


j¼1 i¼1
 
pðyjX; wÞpðwÞ 1
pðwjy; XÞ¼ ∝pðyjX; wÞpðwÞeN 2 A−1 Xy; A−1 ; where m is the number of analogous products, n is the
pðyjXÞ σ
ð19Þ number of dimensions for evaluating similarities, wi is the
importance weight of dimension i, and xij is the similarity
weight between the product being evaluated and product j in
terms of dimension i. Further, pj and qj are the coefficients of
the external and internal influences of product j, respectively.
We employed the same four dimensions as those used for
defining the product attributes: industry, market, technology,
and use. The importance weights of each dimension were
first determined using the pairwise comparison method [50].
For each of the 80 products, three to five similar products
were selected, and their similarity weights were assigned by
the experts so that the sum of the weights of the analogous
Fig. 4. The structure of an ensemble of individual forecasting models. products for each dimension equaled 1.

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 9

The leave-one-out validation method was adopted to Table 3


verify the prediction models. If there is a large amount of The prediction performances of each regression algorithm for p and q (The
bold face numbers denote the lowest error among the algorithms).
data, k-fold cross-validation with k = 5 or 10 is generally
used; the entire data set is divided into k subgroups, and Algorithm p q
every subgroup is set aside in turn for validation and the
MAE RMSE MAE RMSE
model is trained on the basis of the remaining (k − 1)
MLR 0.0026 0.0031 0.0688 0.0898
subgroups. The prediction outcomes of the subgroups are
k-NN 0.0029 0.0041 0.0709 0.1018
then integrated to measure the overall performance of the ANN 0.0036 0.0045 0.1428 0.1813
model. If there is a relatively small amount of data, by contrast, SVR 0.0027 0.0036 0.0961 0.1301
the leave-one-out validation is more appropriate, as it increases CART 0.0054 0.0079 0.1230 0.1723
k up to the total number of instances in order to secure as many GPR 0.0026 0.0034 0.0790 0.1053
Analogy 0.0053 0.0073 0.1337 0.1907
training instances as possible. Because using 80 products is
considered to be insufficient, the leave-one-out validation is a
better choice in our experiment.
The prediction models were then evaluated in terms of
the mean absolute error (MAE) and the root mean squared
greater internal and external influence on product demand.
error (RMSE),
However, NCG and EI result in the reversed sign of coefficients
if the target changes; a higher necessity of complementary
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi goods therefore interrupts the external but promotes the
u n
1X n u1 X  2 internal influence, while a higher ease of imitation promotes
MAE ¼ y −b
yi ; RMSE ¼ t y −b yi ; ð23Þ
n i¼1 i n i¼1 i the external but interrupts the internal influence. In addition,
the demand creation led by the mass media is encouraged by a
higher DN_R but discouraged by a higher ASG, LP, or NR, while
where y is the target value (estimated p or q of the Bass the demand creation led by word-of-mouth is encouraged by a
diffusion model) and b
y is the predicted value derived by the higher IC_O or DN_I, but discouraged by a higher RTC or NL.
model. The lower the performance measures, the better the
prediction model.
4.2. Forecasting performances: single models
4. Results
The performances of each regression algorithm in predicting
4.1. Variable selection p and q are summarized in Table 3. Note that because there are
few differences between these performance measures, we focus
The input variables selected by the stepwise linear regres- on the MAE in the presented analysis. MLR is found to be the
sion are presented in Table 2. Among the 24 candidate input most outstanding of the single prediction models, followed by
variables, only seven were identified as being crucial for the SVR and GPR in the case of p and by k-NN and GPR in the case of
prediction of p and q at a significance level of 0.05. Three input q. Its MAE for p, 0.0026, is the lowest among all algorithms and it
variables (DDC, DN_R, and EI) are positively related to p, while is less than 42% of the average of the estimates. For the
the other four (NCG, ASG, LP, and NR) are negatively related. By prediction of q, MLR is even more accurate (MAE = 0.0688)
contrast, four input variables (IC_O, NCG, DDC, and DN_I) are and no greater than 25% of the average of the estimates. It is
positively related to q, while the other three (RTC, EI, and NL) worth noting that ANN and CART provide inferior prediction
are negatively related. It was also found that three input performances; the MAE of CART for p and MAE of ANN for q are
variables (NCG, DDC, and EI) are necessary for predicting both almost twice as large as those of MLR. In contrast to SVR and
p and q. Note that only DDC has positive coefficients for both GPR, however, they are both based on the empirical risk
targets; more diversified distribution channels thus give rise to minimization scheme, under which principle good

Table 2
The selected input variables for p and q by stepwise linear regression (α =0.05, the variables with bold face are selected for both p and q).

p q

Variable Coefficient t-statistic p-value Variable Coefficient t-statistic p-value

NCG −0.000786 −2.20 0.031 NCG 0.036543 3.82 b0.001


DDC 0.001988 5.40 b0.001 DDC 0.037047 3.09 0.003
EI 0.000911 2.71 0.008 EI −0.04629 −5.09 b0.001
ASG −0.000992 −2.79 0.007 IC_O 0.071268 2.85 0.006
LP −0.000934 −2.51 0.014 DN_I 0.077644 3.19 0.002
DN_R 0.002237 2.65 0.010 RTC −0.02975 −2.86 0.005
NR −0.001546 −4.05 b0.001 NL −0.02753 −2.86 0.006

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
10 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

a) Residuals for p b) Residuals for q

Fig. 5. The distribution and the box plot of the prediction residuals for each single prediction model.

generalization is guaranteed only when a sufficient number of 4.3. Forecasting performances: ensemble model
training instances are provided. This requirement is not likely
to be met by having 80 products only, and therefore, ANN and Of the 57 possible combinations of regression algorithms for
CART result in lower prediction accuracies. constructing ensemble models, the union of MLR and GPR was
Turning to the comparison of prediction performance found to be most accurate. Table 4 compares the prediction
between the statistical and machine learning-based algorithms performances of the best ensemble model with those of the
and the conventional analogical method, the former are superior analogical method and MLR. Constructing an ensemble model
to the latter with only a few exceptions, namely CART for p and resulted in the prediction errors deceasing greatly for both p and
ANN for q. Setting these exceptions aside, the analogical method q; almost 90% of the MAE and RMSE of the analogical method
is at least 47% worse than the others at predicting p and at least disappeared in both cases. Compared with the best single model
8.7% worse at predicting q. Compared with the best single (i.e., MLR), the ensemble prediction model still provided a
algorithm, the MAE of the analogical method is almost twice that surprising improvement; over 75% of prediction errors were
of MLR for both p and q. In other words, in the best-case scenario, reduced for both p and q regardless of the performance
the new prediction methodology proposed in this study has a measures.
twice as strong predictive power compared with the conven- Fig. 6 depicts the estimated Bass model parameters
tional analogical method. (target, x-axis) and prediction outcomes (y-axis) derived by
Fig. 5 shows the distribution and box plot of the residuals the three models: the analogical method, the best single
( y−b y ) produced by each single prediction model. A good model (i.e., MLR), and the best ensemble model (i.e., MLR and
prediction model should meet the following two qualifica- GPR). The straight line in the figures represents the ideal
tions: (1) the average of the residuals should be as close to cases where predicted outcomes equal to their actual targets.
zero as possible and (2) the dispersion of the residuals should Thus, the closer the points approach the line, the better is the
be as narrow as possible. Given these conditions, we find that prediction model. In this respect, the analogical method is the
the conventional analogical method is the poorest model
because not only does it have the widest residual dispersion
but also its average is below zero for both p and q. As
previously noted, ANN and CART are inferior to the other
statistical and machine learning-based models since their
Table 4
dispersions are almost as wide as that of the analogical The prediction performance of the ensemble model compared to the
method. This confirms that MLR produces the most desirable analogical method and the MLR (The bold face numbers denote the lowest
outcomes in that its prediction residuals are most narrowly error among the algorithms).
distributed with an average value close to zero. However, the Algorithm p q
box plot implies that k-NN, SVR, and GPR seem better than
MLR at predicting q, because their inter-quantile ranges are MAE RMSE MAE RMSE

smaller than that of MLR. However, their MAEs and RMSEs MLR 0.0026 0.0031 0.0688 0.0898
are spoiled by a few products, for which they fail to make Ensemble 0.0006 0.0007 0.0180 0.0223
Analogy 0.0053 0.0073 0.1337 0.1907
proper predictions, in contrast to MLR.

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 11

most inferior of the three models because the points in the problems caused by MLR: there are no over-predicted values
Fig. 6(a) and (b) seem to be almost randomly distributed. In and negative outcomes appear in only two products when
addition, its prediction coverage is much narrower than that of predicting p.
the actual targets; its predicted p values are generally located The experimental results in this study can be summarized as
within 0 and 0.01, although their target values are in the range follows. First, the product attributes configured herein are shown
of 0 to 0.03. Moreover, its predicted q values are mostly located to be valid, as they lead the regression algorithms to accurate
within 0 and 0.3 even though their target values are in the prediction results. Second, statistical and machine learning-
range of 0 to 0.7. It seems as though the randomness and based regression algorithms result in a higher predictive power
narrow coverage of these predicted outcomes can be resolved compared with the conventional analogical model. Among
using MLR (Fig. 6(c) and (d)). However, two other issues arise the regression algorithms, the best prediction model was MLR;
from the MLR results: (1) MLR tends to over-predict the not only because its prediction error rate was the lowest, but
estimates beyond a certain extent (p N 0.02 and q N 0.05) and also because its residual errors were most compactly distrib-
(2) the prediction outcomes of some products are negative. uted. Lastly, the ensemble model significantly enhanced the
Fig. 6(e) and (f) shows that the ensemble model can overcome prediction accuracy of the single regression algorithms. In

a) Analogy (p) b) Analogy (q)

c) MLR (p) d) MLR (q)

e) Ensemble (p) f) Ensemble (q)

Fig. 6. The predicted against target values of the analogical method, the best single model (MLR), and the ensemble model.

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
12 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

addition, it resolved the practical issues caused by the other scope to the North American market. Although previous
approaches, such as low correlation between the actual and studies have surveyed consumers' purchase intentions to
predicted estimates, over-prediction beyond a certain extent, estimate the potential market size, this study rather uses the
and negative outcome generation. total number of households in North America as a proxy for
the potential market size because this example case aims to
5. Illustrative example: demand forecasting of 3D TV illustrate how to apply the developed prediction models to
estimate the parameters of the two types of communication
For the purpose of illustration, demand for 3D TV in the effects.
North American market is forecasted in this section. 3D TV is a Demand for 3D TV in North America for 15 years (from
next-generation display that conveys depth perception to the 2010 to 2024) was then forecasted by combining the total
viewer by employing techniques such as stereoscopic display, number of the North American households in 2010 (128 mil-
multi-view display, 2D-plus-depth, or any other form of 3D lion) by the two predicted parameters. Fig. 7(a) and (b)
display. As interest in 3D, which started with the movie Avatar, depicts the annual and cumulative demand patterns derived
has exploded, major TV manufacturers such as Samsung, LG, from the ensemble model. Comparing the forecasts with real
Sony, and Panasonic have aimed to steal market share by sales data in first years may uphold the validity of the
launching various types of 3D TVs. Although Samsung launched prediction models. 3D TV shipments in the North American
the first commercial 3D TV in 2008, 2010 was considered to be market totaled 8.4 million units in 2012, from 4.5 million
the breakthrough year of the technology, as over six million 3D units in 2011. The forecast cumulative sales in the ensemble
TVs were sold in 2010 [51]. Because only three years of data model are 4.2 million in 2011 and 8.5 million in 2012, which
points are available at this stage, reliable parameter estimates are very similar to reals sales, although the annual sales in
of the Bass model cannot be produced using the NLS estimation. 2010 are somewhat overestimated.
We therefore apply the developed prediction models to forecast
3D TV demand.
6. Conclusions
First, the attribute values of 3D TV for the seven significant
variables were measured by expert judgments. These experts
This study proposed a statistical and machine learning
included engineers and marketing managers of one of the
algorithm-based approach to pre-launch product demand
leading display manufacturers as well as professors majoring
forecasting on the basis of the Bass model. Taking the product
in electronic engineering, particularly display devices. The
attribute DB as inputs and product diffusion DB as outputs,
measured values of 3D TV are presented in Table 5. The best
single prediction models were developed using the six
single regression model (MLR) and best ensemble model
regression algorithms, on the basis of which an ensemble
(MLR and GPR) were then employed to predict the param-
prediction model was constructed to enhance predictive
eters of 3D TV.
power. It was shown that most single prediction models
Inputting the obtained attribute values of 3D TV into the
outperformed the conventional analogical method and that
two models produced two sets of Bass diffusion parameters,
the ensemble model improved prediction accuracy further.
as shown in Table 6. The estimates obtained from the MLR
An illustrative example of 3D TV was also provided to
model were found to be slightly higher than those from the
demonstrate how the developed models could be used in
ensemble model for both p and q. Once p and q had been
practice.
obtained, important points in the product lifecycle, such as
This study contributes to the field of pre-launch forecast-
takeoff (T1) and peak (T*), could then also be straightfor-
ing by proposing a new approach that utilizes statistical and
wardly predicted [52]. Times to takeoff and peak in the MLR
machine learning-based regression algorithms. Despite the
model were found to slightly precede those in the ensemble
importance of the pre-launch forecasting of new products,
model.
conventional approaches such as subjective and analogical
Next, potential market size (m) was estimated in order to
methods fail to produce objective estimates of diffusion
forecast annual demand. This illustrative example limits its
parameters. However, adopting statistical and machine
learning-based regression algorithms can reliably portray
the relationship between the attributes and diffusion charac-
teristics of existing products, which, in turn, enables forecasting

Table 5
Attribute values of 3D TV.

Variable Value

Industry classification (IC) I


Necessity of complementary goods (NCG) 4 Table 6
Availability of substitute goods (ASG) 2 Predicted parameters of 3D TV.
Level of price (LP) 2
Diversity of distribution channels (DDC) 4 Model Parameter Life cycle
Degree of newness (DN) I estimates
Rate of technological change 2
p q Takeoff Peak
Easiness of imitation (EI) 2
Necessity of learning (NL) 2 MLR 0.0087 0.5732 4.9 (Late in 2014) 7.2 (Early in 2017)
Necessity of repurchase (NR) 3 Ensemble 0.0091 0.5525 5.0 (Early in 2015) 7.3 (Early in 2017)

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 13

a) Annual demand of 3D TV in the North American market

b) Cumulative demand of 3D TV in the North American market

Fig. 7. Demand forecasts of 3D TV in the North American market from the ensemble model.

new product demand solely on the basis of product attributes product DB, the higher is forecasting accuracy. The product DB
(i.e., without human manipulation) and fosters effective pre- should also be regularly updated. Also, the variables selected as
launch decision-making. product attributes are by no means exhaustive or fixed. Although
However, the prerequisite for benefiting from the pro- this paper employed 17 variables, they may not be sufficient to
posed approach is maintaining a relevant and sufficient DB explain the idiosyncratic characteristics of the diffusion of
of existing products. Although the primary purpose of this various products. The significant variables can also vary
study was to propose a new approach, the data used for depending on the country or industry context. Future explor-
constructing the prediction models herein may not be atory studies may help identify the context-specific factors that
enough to take advantage of the statistical and machine affect the diffusion of products.
learning algorithms. In addition, as we accumulated as many
products as possible in the product DB, the nations and
industries into which the products were diffused varied, Acknowledgments
implying that the validity of the parameter estimates may be
questionable. This work was supported by the Korea Institute of Science
Moreover, the performance of the proposed approach relies and Technology Information (KISTI) and the National Research
heavily on the product DB used to construct the prediction Foundation of Korea (NRF) grants funded by the Korea govern-
models. The more homogeneous products included in this ment (MEST) (No. 2011-0012759 and 2011-0021893).

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
14 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

Appendix A. The Bass model parameters estimated by the NLS and predicted by the MLR, ensemble, and analogy

Product p q

NLS MLR Ensemble Analogy NLS MLR Ensemble Analogy

PC printers 0.000007 −0.001303 0.000686 0.001749 0.173776 0.108571 0.170087 0.131227


Optical cable 0.000027 0.000547 0.000448 0.010440 0.611847 0.907479 0.606509 0.516968
Portable and transportable navigation 0.000114 −0.001895 0.000784 0.010841 0.887896 1.269737 0.809062 1.419575
Press machine 0.000189 0.002099 0.000781 0.001110 0.443570 0.381084 0.461186 0.068896
Automobile 0.000232 −0.002607 0.001240 0.001880 0.218231 0.268223 0.212583 0.115398
Cellular phones 0.000344 −0.003137 0.001410 0.002719 0.267173 0.169772 0.254270 0.134534
Typewriter 0.000409 −0.006499 0.001476 0.009295 0.488950 0.626315 0.517113 0.059778
DBS satellite 0.000504 −0.002351 0.001054 0.012096 0.253918 0.374968 0.246854 0.665767
Internal-combustion engine 0.000535 0.000255 0.001818 0.001076 0.107675 0.082482 0.115944 0.102486
Mobile telephone 0.000627 0.004263 −0.000144 0.002708 0.431015 0.462085 0.415481 0.127981
Cellular phone registration 0.000728 0.000772 0.000627 0.003380 0.700340 0.965365 0.653765 0.146509
Elevator 0.000760 −0.002713 0.000390 0.001053 0.227349 0.173853 0.223963 0.090518
Voice recorder 0.000854 −0.007097 −0.000815 0.002800 0.245686 0.168662 0.263859 0.108042
Digital TV sets & monitors 0.000967 0.003219 0.001373 0.005509 0.500333 0.607216 0.487432 0.252469
Portable MP3 players 0.001064 −0.000071 0.000146 0.007446 0.641945 0.777833 0.624977 0.248487
Dishwashers 0.001072 −0.004083 0.001253 0.000605 0.141863 0.150857 0.131015 0.073823
Telephone 0.001105 −0.001669 −0.000292 0.004705 0.266719 0.133493 0.292473 0.235467
Color TV 0.001163 −0.000091 0.000668 0.005498 0.268695 0.327285 0.249017 0.265833
Plasma DTV 0.001324 −0.000342 0.001189 0.004391 0.593112 0.792773 0.559591 0.197693
Domain registration 0.001368 0.003230 0.001705 0.003413 0.411505 0.506795 0.436337 0.182340
Modems/fax modems 0.001444 −0.003999 0.001252 0.004012 0.178130 0.040046 0.200587 0.207144
Electric fan 0.001511 −0.000289 0.001301 0.001717 0.184559 0.123643 0.182164 0.089883
Washing machine 0.001511 0.000714 0.001722 0.000429 0.184559 0.236248 0.222586 0.056745
Food disposers 0.001584 −0.003468 0.001945 0.001339 0.108291 0.006477 0.099216 0.047727
Digital cameras 0.001629 −0.000039 0.001815 0.005770 0.440367 0.403553 0.422049 0.258400
Analog color TV 0.001719 −0.004620 0.001604 0.005466 0.162791 0.115477 0.198342 0.271943
Compact audio systems 0.001974 −0.003861 0.001454 0.007952 0.229504 0.178163 0.240487 0.219916
Total CD players 0.001996 −0.004940 0.001404 0.005514 0.238066 0.230657 0.251809 0.216656
Air conditioners 0.002452 0.006253 0.002589 0.001058 0.128405 0.017830 0.109323 0.129191
VCR decks with stereo 0.002496 −0.000429 0.003139 0.007533 0.297619 0.385133 0.275455 0.189849
Electronic copier 0.002498 0.000672 0.002770 0.000005 0.187467 0.242066 0.175729 0.121643
Dehumidifiers 0.002582 0.002038 0.002972 0.001259 0.138786 0.073962 0.124480 0.045287
Vending machine 0.002653 0.000863 0.003208 0.008398 0.317252 0.430895 0.284182 0.281935
Record player 0.002683 −0.004661 0.002052 0.002731 0.241972 0.203099 0.260761 0.108182
Cordless telephones 0.002694 0.001564 0.001547 0.003105 0.203386 0.042198 0.199541 0.165024
Clothes dryers 0.002750 0.004099 0.002247 0.001246 0.101752 −0.041971 0.105615 0.048250
VCR decks 0.002831 −0.001168 0.002363 0.008520 0.164984 0.197098 0.158947 0.252340
Key phone 0.003406 −0.002080 0.003095 0.003895 0.029691 −0.214969 0.043558 0.216050
Electric rice cooker 0.003449 0.003296 0.004149 0.001190 0.098339 −0.006931 0.071583 0.048523
Portable CD equipment 0.003593 −0.002656 0.001768 0.007193 0.264427 0.318750 0.291797 0.286239
Black-and-white television 0.003599 0.000551 0.004717 0.005358 0.321698 0.475157 0.325070 0.262775
Bed coverings 0.003612 0.004822 0.003640 0.001053 0.128234 0.010993 0.119144 0.090518
Electric cultivator 0.003760 0.004883 0.004183 0.000116 0.230795 0.145375 0.212446 0.109116
Crane for construction 0.003911 0.005385 0.005165 0.000738 0.168819 0.204768 0.217062 0.096372
Analog projection TV 0.004006 0.005599 0.003467 0.005334 0.256251 0.344262 0.272023 0.266551
Aftermarket PC monitors 0.004006 0.002498 0.004135 0.004267 0.256251 0.267027 0.240701 0.213241
Video tape recorder & player 0.004076 0.004364 0.003950 0.005118 0.318898 0.427691 0.279698 0.290792
Lawn mowers 0.004344 0.009071 0.003879 0.001118 0.129472 −0.057270 0.139290 0.046032
Refrigerator 0.004395 0.003688 0.004310 0.016532 0.121771 −0.108044 0.138332 0.214533
Analog TV/VCR combinations 0.004979 0.005422 0.005353 0.004222 0.353099 0.354612 0.341510 0.208771
Lathe 0.005895 −0.000093 0.006135 0.000540 0.185121 0.124767 0.211764 0.094741
Freezers 0.006867 0.011703 0.004715 0.007772 0.077488 0.022107 0.089495 0.116123
Camcorders 0.007141 −0.001363 0.007285 0.004300 0.119277 −0.077661 0.154889 0.344024
Digital projection sets & monitors 0.007229 0.013551 0.007307 0.005148 0.687376 0.868696 0.662500 0.241678
Telephone answering devices 0.007422 0.006548 0.007815 0.003654 0.189819 0.246770 0.178793 0.206442
Home theater-in-a-box 0.007972 0.011119 0.008432 0.003404 0.332291 0.419107 0.345450 0.174776
Analog color TV with stereo 0.008169 0.009690 0.008003 0.004754 0.199138 0.188172 0.233555 0.251856
Gas range 0.008272 0.012255 0.007980 0.007311 0.237224 0.306208 0.266662 0.152235
Microwave oven 0.009138 0.009446 0.008936 0.006617 0.190294 0.135113 0.185489 0.189779
Aftermarket remote controls 0.009307 0.008507 0.008875 0.008278 0.224803 0.204132 0.226278 0.182739
DVD players/recorders 0.009722 0.006413 0.010305 0.007518 0.363046 0.255407 0.335606 0.224463
Analog handheld LCD color TV 0.009892 0.004231 0.011012 0.019576 0.173174 0.065502 0.185151 0.147190
Video camera 0.010421 0.008484 0.011464 0.003426 0.530827 0.745438 0.535237 0.234278
Fax machines 0.010969 0.009894 0.010448 0.003441 0.266902 0.217588 0.268064 0.201818
Blank Videocassettes 0.011130 0.009372 0.012286 0.021251 0.137842 0.030209 0.125984 0.106521
Monochrome TV 0.011322 0.015997 0.012090 0.004912 0.086825 0.005411 0.088787 0.276325
CDP 0.012830 0.015818 0.012043 0.006269 0.183301 0.079264 0.176501 0.294352
Rack audio systems 0.013253 0.017287 0.013161 0.001184 0.366526 0.304232 0.372504 0.137702
Corded telephones 0.013786 0.019458 0.013523 0.002727 0.109153 −0.181639 0.149943 0.176069

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 15

Appendix
(continued)
A (continued)
Product p q

NLS MLR Ensemble Analogy NLS MLR Ensemble Analogy

Family radio devices 0.015335 0.022861 0.015506 0.001505 0.380731 0.332642 0.376496 0.068652
Personal computers 0.015950 0.022743 0.014494 0.011918 0.179772 0.078699 0.164870 0.127937
Portable tape and radio/tape players 0.017427 0.022506 0.017595 0.004357 0.232762 0.184680 0.258116 0.217054
LCD TV (Digital and analog) 0.018607 0.023682 0.019608 0.004492 0.245743 0.126912 0.269804 0.267157
Videocassette players 0.019732 0.025518 0.018246 0.004515 0.285609 0.402980 0.296578 0.247694
MP3 0.019738 0.028225 0.018932 0.005578 0.697928 0.925170 0.670885 0.242889
Personal word processors 0.020585 0.023438 0.021476 0.001072 0.215886 0.298904 0.205950 0.282171
LCD monitor 0.021403 0.032172 0.021779 0.004330 0.612865 1.015626 0.581345 0.245977
Analog handheld LCD monochrome TV 0.021752 0.026762 0.021560 0.008903 0.163545 0.046244 0.180689 0.155856
Electronic calculator 0.023836 0.035584 0.023161 0.007975 0.255873 0.378878 0.233736 0.089886
Facsimile 0.024187 0.034919 0.024426 0.002648 0.264186 0.212156 0.253331 0.201981

References [27] F. Sultan, J.U. Farley, D.R. Lehmann, A meta-analysis of applications of


diffusion models, J. Mark. Res. (1990) 70–77.
[1] F.M. Bass, K. Gordon, T.L. Ferguson, M.L. Githens, DIRECTV: forecasting [28] J. Lee, C.-Y. Lee, K.S. Lee, Forecasting demand for a newly introduced
diffusion of a new technology prior to product launch, Interfaces 31 (2001) product using reservation price data and Bayesian updating, Technol.
S82–S93. Forecast. Soc. Change 79 (2012) 1280–1291.
[2] E. Ozkaya, Demand management in global supply chains, (Doctoal [29] F.M. Bass, The adoption of a marketing model: comments and observations,
thesis) Georgia Institute of Technology, Atlanta, 2008. in: V. Mahajan, Y. Wind (Eds.), Innovation Diffusion Models of New Product
[3] E.M. Rogers, Diffusion of Innovations, The Free Press, New York, 1995. Acceptance Ballinger, Cambridge, 1986.
[4] N. Meade, T. Islam, Modelling and forecasting the diffusion of innovation—a [30] W.Y. Lee, P. Goodwin, R. Fildes, K. Nikolopoulos, M. Lawrence, Providing
25-year review, Int. J. Forecast. 22 (2006) 519–545. support for the use of analogies in demand forecasting tasks, Int. J. Forecast.
[5] F.M. Bass, A new product growth model for consumer durables, Manag. 23 (2007) 377–390.
Sci. 15 (1969) 215–227. [31] J. Ilonen, J.-K. Kamarainen, K. Puumalainen, S. Sundqvist, H. Kälviäinen,
[6] T. Turk, P. Trkman, Bass model estimates for broadband diffusion in Toward automatic forecasts for diffusion of innovations, Technol. Forecast.
European countries, Technol. Forecast. Soc. Change 79 (2012) 85–96. Soc. Change 73 (2006) 182–198.
[7] G.L. Lilien, A.G. Rao, S. Kalish, Bayesian estimation and control of [32] H. Gatignon, J. Eliashberg, T.S. Robertson, Modeling multinational diffusion
detailing effort in a repeat purchase diffusion environment, Manag. Sci. patterns: an efficient methodology, Mark. Sci. 8 (1989) 231–247.
27 (1981) 493–506. [33] R.K. Srivastava, V. Mahajan, S.N. Ramaswami, J. Cherian, A multi-attribute
[8] A.G. Rao, M. Yamada, Forecasting with a repeat purchase diffusion diffusion model for forecasting the adoption of investment alternatives
model, Manag. Sci. 34 (1988) 734–752. for consumers, Technol. Forecast. Soc. Change 28 (1985) 325–333.
[9] P.J. Lenk, A.G. Rao, New models from old: forecasting product adoption [34] V. Mahajan, R.A. Peterson, Innovation diffusion in a dynamic potential
by hierarchical Bayes procedures, Mark. Sci. 9 (1990) 42–53. adopter population, Manag. Sci. 24 (1978) 1589–1597.
[10] J. Xie, X.M. Song, M. Sirbu, Q. Wang, Kalman filter estimation of new [35] S.M. Ross, Introduction to Probability and Statistics for Engineers and
product diffusion models, J. Mark. Res. (1997) 378–393. Scientists, Elsevier Science, London, 2009.
[11] V. Mahajan, S. Sharma, A simple algebraic estimation procedure for [36] T. Hastie, R.J. Tibshirani, J.J.H. Friedman, The Elements of Statistical Learning:
innovation diffusion models of new product acceptance, Technol. Data Mining, Inference, and Prediction, Springer, New York, 2009.
Forecast. Soc. Change 30 (1986) 331–345. [37] G.S. Iwerks, H. Samet, K. Smith, Continuous k-nearest neighbor queries
[12] K.D. Lawrence, W.H. Lawton, Applications of diffusion models: some for continuously moving points with updates, Proceedings of the 29th
empirical results, in: V. Mahajan, R.C. Cardozo (Eds.), New Product International Conference on Very Large Data Bases (VLDB'03), VLDB
Forecasting, Lexington Books, Lexington, 1981, pp. 529–541. Endowment, Berlin, Germany, 2003, pp. 512–523.
[13] T. Kim, J. Hong, H. Koo, Forecasting diffusion of innovative technology [38] Y. Xiaohui, K.Q. Pu, N. Koudas, Monitoring k-nearest neighbor queries
at pre-launch, Ind. Manage. Data Syst. 113 (2013) 800–816. over moving objects, Proceedings of the 21st International Conference
[14] R.J. Thomas, Estimating market growth for new products: an analogical on Data Engineering (ICDE 2005), Tokyo, Japan, 2005, pp. 631–642.
diffusion model approach, J. Prod. Innov. Manag. 2 (1985) 45–55. [39] K. Mouratidis, M.L. Yiu, D. Papadias, N. Mamoulis, Continuous nearest
[15] B.L. Bayus, High-definition television: assessing demand forecasts for a neighbor monitoring in road networks, Proceedings of the 32nd
next generation consumer durable, Manag. Sci. 39 (1993) 1319–1333. International Conference on Very Large Data Bases, VLDB Endowment,
[16] J.M. Choffray, G.L. Lilien, A decision-support system for evaluating sales Seoul, Korea, 2006, pp. 43–54.
prospects and launch strategies for new products, Ind. Market. Manag. [40] W.G. Baxt, Use of an artificial neural network for the diagnosis of
15 (1986) 75–85. myocardial infarction, Ann. Intern. Med. 115 (1991) 843–848.
[17] J. Lee, P. Boatwright, W.A. Kamakura, A Bayesian model for prelaunch [41] D.C. Park, M.A. El-Sharkawi, R.J. Marks II, L.E. Atlas, M.J. Damborg,
sales forecasting of recorded music, Manag. Sci. 49 (2003) 179–196. Electric load forecasting using an artificial neural network, IEEE Trans.
[18] H. Seol, G. Park, H. Lee, B. Yoon, Demand forecasting for new media Power Syst. 6 (1991) 442–449.
services with consideration of competitive relationships using the [42] A. Heiat, Comparison of artificial neural network and regression models for
competitive Bass model and the theory of the niche, Technol. Forecast. estimating software development effort, Inform. Software Tech. 44 (2002)
Soc. Change 79 (2012) 1217–1228. 911–922.
[19] P. Goodwin, K. Dyussekeneva, S. Meeran, The use of analogies in [43] J. Snyman, Practical mathematical optimization: an introduction to basic
forecasting the annual sales of new electronics products, IMA J. Manag. optimization theory and classical and new gradient-based algorithms,
Math. 24 (2012) 407–422. Springer, New York, NY, 2005.
[20] V. Mahajan, E. Muller, F.M. Bass, New product diffusion models in [44] T.J. Ypma, Historical development of the Newton–Raphson method,
marketing: a review and directions for research, J, Marketing (1990) 1–26. SIAM Rev. 37 (1995) 531–551.
[21] C.M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, [45] D.W. Marquardt, An algorithm for least-squares estimation of nonlinear
New York, 2006. parameters, SIAM J. Appl. Math. 11 (1963) 431–441.
[22] T.M. Mitchell, Machine Learning, McGraw-Hill, Inc., Singapore, 1997. [46] A.J. Smola, B. Schölkopf, A tutorial on support vector regression, Stat.
[23] D.C. Schmittlein, V. Mahajan, Maximum likelihood estimation for an innova- Comput. 14 (2004) 199–222.
tion diffusion model of new product acceptance, Mark. Sci. 1 (1982) 57–78. [47] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and
[24] V. Srinivasan, C.H. Mason, Technical Note—Nonlinear least squares estima- Regression Trees, Wadsworth and Brooks, Monterey, 1984.
tion of new product diffusion models, Mark. Sci. 5 (1986) 169–178. [48] C.E. Rasmussen, C.K.I. Williams, Gaussian Processes for Machine Learning,
[25] R.M. Heeler, T.P. Hustad, Problems in predicting new product growth The MIT Press, London, 2005.
for consumer durables, Manag. Sci. 26 (1980) 1007–1020. [49] R. Avnimelech, N. Intrator, Boosted mixture of experts: an ensemble
[26] W.P. Putsis, V. Srinfvasan, Estimation techniques for macro diffusion learning scheme, Neural. Comput. 11 (1999) 483–497.
models, in: V. Mahajan, E. Muller, Y. Wind (Eds.), New-product Diffusion [50] T.L. Saaty, How to make a decision: the analytic hierarchy process, Eur.
Models, Springer, New York, 2000, pp. 263–291. J. Oper. Res. 48 (1990) 9–26.

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
16 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx

[51] Displaybank, 3D TV Industry Trend and Market Forecast, Speical Report Hyun-woo Park is a technology economist at the Korea Institute of Science
May 2010, Gyeonggi-do, Korea, 2010. and Technology Information (KISTI). He worked with San Francisco State
[52] V. Mahajan, E. Muller, R.K. Srivastava, Determination of adopter categories University as a Visiting Fellow (1996–1997) and University of California,
by using innovation diffusion models, J. Mark. Res. 27 (1990) 37–50. Santa Cruz as a Research Scholar (2008–2009). He received the B.S., M.S,
and Ph.D. in International Business (1991) from Hong-Ik University,
and Ph.D. in Science and Technology Studies (2007) from Korea University.
Hakyeon Lee is an assistant professor in the Dept. of Industrial and Information
He published many articles and books in the research fields including
Systems Engineering, Seoul National University of Science and Technology. He
technology valuation, innovation management, technology commercialization,
received the degrees of B.S. and Ph.D. from Seoul National University. His research
and scientometrics. He has authored a number of papers on related topics in
interests are in technological forecasting and innovation diffusion. He has
leading journals such as Scientometrics, Asian Journal of Technology Innovation,
authored several published papers in leading journals of technology and
and Journal of Supply Chain and Operations Management.
innovation management, including Technological Forecasting and Social Change,
Research Policy, Technology Analysis & Strategic Management, Journal of Engineering
and Technology Management, and Technology in Society.

Sang Gook Kim is a senior researcher of the Industry Information Analysis Pilsung Kang is an assistant professor in the Dept. of Industrial and
Center, Korea Institute of Science and Technology Information. He received the Information Systems Engineering, Seoul National University of Science
degree of BS in Industrial Engineering at Seoul National University and the Ph.D. in and Technology. He received B.S. and Ph.D. in Industrial Engineering at
Operations Research at Florida Institute of Technology. His recent research Seoul National University. His research interests include instance-based
interests are in technology valuation, product lifecycle forecasting, and technology learning, learning kernel machines, novelty detection, learning algorithms in
commercialization and main study areas also include queuing models based on class imbalance, and network analysis. His research also includes application
stochastic process and optimization theories. He has authored a few papers on areas such as keystroke dynamics-based authentication, fault detection in
related topics in leading journals such as Nonlinear Analysis: Theory, Methods & manufacturing process, and technological forecasting. He has published a
Applications, Journal of Supply Chain and Operations Management, and Asian Journal number of papers on related topics in leading journals such as Pattern
of Innovation and Policy. Recognition, Intelligent Data Analysis, and Neurocomputing.

Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020

You might also like