Bass Diffusion Model
Bass Diffusion Model
Bass Diffusion Model
a r t i c l e i n f o a b s t r a c t
Article history: This study proposes a novel approach to the pre-launch forecasting of new product demand
Received 23 April 2013 based on the Bass model and statistical and machine learning algorithms. The Bass model is
Received in revised form 12 August 2013 used to explain the diffusion process of products while statistical and machine learning
Accepted 17 August 2013 algorithms are employed to predict two Bass model parameters prior to launch. Initially, two
Available online xxxx
types of databases (DBs) are constructed: a product attribute DB and a product diffusion DB.
Taking the former as inputs and the latter as outputs, single prediction models are developed
Keywords: using six regression algorithms, on the basis of which an ensemble prediction model is
Pre-launch forecasting constructed in order to enhance predictive power. The experimental validation shows that
Bass model
most single prediction models outperform the conventional analogical method and that the
Multivariate linear regression
ensemble model improves prediction accuracy further. Based on the developed models, an
Machine learning
Ensemble illustrative example of 3D TV is provided.
© 2013 Elsevier Inc. All rights reserved.
0040-1625/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.techfore.2013.08.020
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
2 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
have been developed, which can be categorized into three algorithms, such as multilayer linear regression, support vector
types: Bayesian approaches, subjective approaches, and analog- regression, and Gaussian process regression, as well as an
ical approaches. The Bayesian approach starts with pre-launch ensemble model of these, are used in building the prediction
forecasts and updates them as additional data become available. models. Finally, their performances are evaluated using the
Various methods for updating the parameter estimates or the mean absolute error (MAE) and the root mean squared error
forecasts have been proposed [7–10]. However, this approach (RMSE).
centers on how to update forecasts after launch and still calls for Pre-launch product demand forecasting on the basis of
initial pre-launch forecasts to be updated as data become statistical and machine learning algorithms has several advan-
available. The subjective approach produces parameter esti- tages over conventional analogical methods. First, a reliable
mates through an algebraic estimation procedure on the basis relationship between the attributes and diffusion characteris-
of managerial judgments of tangible information such as the tics of existing products can be found, which in turn enables
time and level of peak sales [11] and the sum of the coefficients new product demand forecasts to be based solely upon these
of the external and internal influences [12]. The drawback of attributes without any human manipulation. In other words,
the subjective approach is that obtaining accurate judgments is analogous products are automatically selected and their
as difficult as estimating accurate parameters [13]. Finally, the contributions to forecasting systematically determined by
analogical approach, called “guessing by analogy,” has prevailed the prediction model; although such selection and determi-
in the literature. It assumes that a new product will have a nation processes are not easily understood by humans, they
diffusion pattern similar to those of its analogous products over are mathematically sound and analytically tractable. There-
time [1]. Under this approach, the parameter estimates of a fore, forecasting is no longer dependent on the subjective
new product are obtained by taking a weighted sum of the judgments of human experts, but becomes an objective outcome
parameters of analogous products, with the weights derived by obtained by the combination of learning algorithms and product
establishing similarities between the new product and several data. Second, since statistical and machine learning algorithms
analogous products [14]. are designed for interpolation as well as extrapolation, forecast-
Although the analogical approach has been applied widely ing accuracy can be improved. The parameter values predicted
[1,15–19], it has two main limitations. First, there are no clear by conventional analogical methods are bounded by the current
guidelines for how to select benchmarks even though the maximum and minimum estimates of reference products. If a
estimated parameters are highly dependent on the analogous new type of diffusion style occurred, whose parameter values
products under consideration. Second, the similarities are were far from the current boundary, conventional analogical
established by expert judgments that are naturally subjective methods would not properly reflect this eventuality. Statistical
in nature. A promising solution to these problems is using the and machine learning-based approaches, by contrast, would
historical empirical relationship between the parameters and scent the change from the inside of the product and digest it into
attributes of analogous products. Rogers [3] emphasizes that prediction. In light of the foregoing, this study proposes a new
the attributes of an innovation are important variables in approach to the pre-launch forecasting of new product demand,
explaining its rate of adoption. Once this relationship is which utilizes the Bass diffusion model and statistical and
identified, the parameters of a new product can be estimated machine learning-based regression algorithms. In addition, we
by knowing its characteristics [20]. Although identifying the also boost prediction accuracy by constructing an ensemble of
relationship between diffusion parameters and product attri- individual prediction models.
butes can serve a reliable basis for pre-launch forecasting of a The remainder of this paper is organized as follows. Section 2
new product, however, little research has been carried out thus reviews previous studies of pre-launch forecasting with the Bass
far in this direction and no systematic approach has yet been model. Section 3 demonstrates the proposed framework includ-
developed. ing the product and diffusion database (DB) design, single
The tenet of this study is that a statistical and machine prediction model development, and ensemble model construc-
learning-based approach can overcome the limitations of tion. Section 4 validates the single and ensemble prediction
the conventional analogical approach to pre-launch fore- models and provides an illustrative case study on the basis of the
casting. The goal of statistical and machine learning is to best single and ensemble models. Finally, the conclusion and
discover intrinsic, sometimes unanticipated, relationships limitations of this paper are presented alongside future research
between variables with the help of high computational directions in Section 5.
power [21,22]. A typical procedure of an inductive statis-
tical and machine learning approach is as follows. The first 2. Bass model and pre-launch forecasting
step is to set up the model structure by defining a learning task,
configuring input–output variables, seeking appropriate algo- The Bass model assumes that a technological innovation is
rithms, and selecting proper performance criteria. The second spread by two types of influences: externally by the mass
step is to collect sufficient real-world examples, which are then media and internally by word-of-mouth. It can be derived
divided into training and test data sets. In the third step, the from a hazard function that represents the probability that an
employed learning algorithms are optimized on the basis of the adoption occurs at time t given that it has not yet occurred:
training data set. Finally, the best model is identified on the
basis of the test data set, using the predetermined performance f ðt Þ
hðt Þ ¼ ¼ p þ qF ðt Þ ð1Þ
criteria. In this study, the learning task is defined as predicting 1− F ðt Þ
the parameters of the Bass model prior to launch, while input
and output variables are configured as product attributes where f(t) is the density function of time to adoption, F(t) is
and diffusion characteristics, respectively. Various regression the cumulative proportion of adopters at time t, and p and q
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 3
are the coefficients of external and internal influences, Rao and Yamada [8] expand the work of Lilien et al. [7] to show
respectively. Eq. (1) can thus be rewritten as that incorporating priors and the use of perceptual data can
improve forecasting performance. Lenk and Rao [9] further
dF ðt Þ suggest the Hierarchical Bayes procedure that explicitly
f ðt Þ ¼ ¼ ½p þ qF ðt Þ½1− F ðt Þ: ð2Þ
dt considers between-product and within-product variations
in establishing initial estimates for the new product. An adaptive
Because the number of adopters at time t, n(t), can be procedure combined with Bayesian updating, called the
obtained by multiplying f(t) by the potential market size, m, augmented Kalman filter with continuous state and discrete
rearranging Eq. (2) yields observations, is also proposed by Xie et al. [10]. While the
aforementioned approaches employ prior information de-
dN ðt Þ h q i
nðtÞ ¼ ¼ p þ Nðt Þ ½m−Nðt Þ: ð3Þ rived from the diffusion of previously introduced products
dt m
deemed most similar, Sultan et al. [27] develop initial
parameter estimates by conducting a meta-analysis of 213
The first term, p[m − N(t)], represents adoptions owing
sets of parameters from 15 articles. Lee et al. [28] utilize
to external influences, while the second term, mq N ðt Þ
consumer reservation price data in order to construct prior
½m−Nðt Þ, stands for adoptions owing to internal influences.
distributions. Regardless of the vehicle used for obtaining initial
In a pure innovation scenario where only external influences
forecasts, however, time-varying estimation approaches main-
exist (i.e., q = 0), the Bass model reduces to an exponential
ly focus on updating after launch, which still requires initial
function. If p is zero, however, it is equivalent to a logistic
pre-launch forecasts to be updated as data become available.
model, assumed to be driven by only imitative processes,
Several approaches to producing pre-launch forecasts have
namely a pure imitation situation. Solving Eq. (2) produces
been proposed in previous studies and can be classified into
the following closed form solution:
two types: subjective approaches and analogical approaches. In
terms of subjective approaches, Mahajan and Sharma [11]
1−e−ðpþqÞt
FðtÞ ¼ q −ðpþqÞt : ð4Þ propose an algebraic estimation procedure that requires three
1þ e pieces of information: 1) market potential (m), 2) time of peak
p
sales (t*), and 3) level of peak sales (n*). Once these values have
The cumulative number of adopters at time t, N(t), can been obtained, p and q can be inferred. However, estimating
then be written as both the time and level of peak sales is another difficult task. In
addition, as Bass [29] notes, these are the key outputs intended
2 3
to be forecast from the observed data using the diffusion
6 1−e 7−ðpþqÞt
model; therefore, if one could guess such items, there would be
NðtÞ ¼ M6
4
7
q −ðpþqÞt 5: ð5Þ
1þ e no need to estimate the diffusion curve. A similar procedure
p suggested by Lawrence and Lawton [12] also involves obtaining
three pieces of managerial information: 1) market potential
The three parameters of the Bass model (m, p, and q) can (m), 2) number of adoptions in the first period (n(1)), and 3) an
usually be estimated through conventional estimation pro- estimate of the sum of the coefficients p and q (p + q). A similar
cedures such as ordinary least square (OLS) [5], maximum problem occurs here; estimating p + q is also difficult. Although
likelihood estimation (MLE) [23], and nonlinear least squares general guidelines suggest a value of 0.5 for consumer goods
(NLS) [24]. However, these methods can be applied only and 0.66 for commercial goods, such generalizations fail to
when enough sales data are available. Previous studies demon- mirror the idiosyncratic characteristics of particular prod-
strate that stable and robust parameter estimates for the Bass ucts. In fact, Lawrence and Lawton [12] suggest that prior
model can be obtained only if the data under consideration sales histories of similar products may produce better prior
include the peak of the noncumulative adoption curve [24,25]. parameter estimates.
When insufficient data are available, such as early in the product In this spirit, the analogical approach has been widely
life cycle or prior to launch, these conventional estimation employed for pre-launch forecasting. This approach assumes
methods cannot produce reliable parameter estimates. Never- that a new product will behave as analogous products do. As
theless, as discussed earlier, sales forecasting is even more crucial mentioned before, several previous studies using the Bayesian
when little or no data are available. Estimates of the diffusion approach utilize the diffusion information of exsiting similar
parameters early in the diffusion process or prior to launch are products as a priori [7–10]. A more systematic approach on the
extremely valuable for managerial decision-making such as basis of consumer choice theory is proposed by Thomas [14]. In
capital equipment purchases, production planning, and market- this approach, the parameters of a new product are estimated
ing strategy [26]. by taking a weighted sum of the parameters of analogous
Time-varying estimation procedures that rely on Bayesian products. These weights can be obtained by establishing the
updating have been introduced by several researchers to cope similarities between the new product and several analogous
with estimating when little or no data are available. These products in terms of five dimensions: environmental situation,
procedures start with forecasts prior to launch and update those market structure, buyer behavior, marketing mix strategy, and
forecasts whenever an additional record becomes available. innovation characteristics. This “guessing by analogy” ap-
Lilien et al. [7] propose a Bayesian approach whereby new proach has often been adopted for the pre-launch forecasting
product sales prior to market entry are predicted by considering of various types of products and services [1,16–19]. However,
the forecasts of a previously introduced similar product and there are no clear standards for the selection of benchmark
then updated once data are available using Bayesian regression. products. Analogous products are often chosen simply because
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
4 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
they are recent or easily recalled [30]. Although Ilonen et al. utilized for the case study. The performance comparison
[31] employ the self-organizing map for automatically identify- with the results of the simple analogical approach is also
ing analogies, their approach is centered on selecting analogous provided to uphold the validity of the proposed approach.
countries rather than analogous products. Bayus [15] suggests a Finally, the working of the developed models is provided
grouping procedure based on hierarchical clustering with factor with the help of a simple illustrative example of 3D TV.
analysis to generate priors on the basis of various products, but
the parameter estimates obtained in this way still highly depend
on what analogous products are selected. Moreover, similarities 3.2. Data
are usually identified by expert judgments or consumer opinions,
which are inherently subjective. As many products as possible for which previous sales data
Using a historical relationship between the parameters and are available should be collated in the DBs in order to improve
attributes of analogous products can be a promising solution to the reliability of the prediction models. One of the major data
this problem. Nonetheless, few studies have been conducted in sources was the CE Historical Data provided by the Customer
this direction. Sultan et al. [27] propose a meta-analysis model Electronics Association. This data set contains US sales data on
utilizing ANOVAs with four types of attributes. However, these more than 60 categories of electronic products from their
attributes are related to the characteristics of the research itself market introductions to the present. In total, 21 US products
such as model specification, estimation method, and data reuse, whose historical sales data were presented in recent papers
while product-related attributes only include type of innovation and reports were included in the DBs. In addition, Korean sales
and geographic effect. Similarly, Gatignon et al. [32] develop a data on over 40 products were gathered from relevant
cross-national econometric model of innovation diffusion, but associations. Since small numbers of observations may produce
its included variables are associated with country-level patterns unreliable estimates, all products with fewer than 12 sales
of social communication such as cosmopolitanism, mobility, and records were then excluded from the data set, which resulted
sex roles; again, no product-related attributes are considered. in 87 products remaining.
Finally, although Srivastava et al. [33] suggest a multi-attribute In the next step, the diffusion parameters of each of these 87
model for forecasting the adoption of investment alternatives products were estimated by fitting their historical data to the
that includes five attributes of investment products. Due to the Bass model in order to construct the product diffusion DB.
high correlation between these attributes, however, only two Although the Bass model is specified by three parameters, the
attributes, information costs and likelihood of loss of principal, potential market size (m), the coefficient of external influence
are employed to forecast the acceptance of a potential invest- (p), and the coefficient of internal influence (q), the product
ment alternative. Further, the attributes considered in this study diffusion DB only includes p and q. Most previous studies of
are too industry-specific to be generalized to other products and pre-launch forecasting center on estimating the two parameters
services. In summary, although establishing the historical of communication effects, p and q, while separately estimating
relationship between diffusion parameters and product attri- m from market research [1,9,10,14,15,18]. Similarly, Lawrence
butes can be considered to be promising for pre-launch new and Lawton [12] and Mahajan and Sharma [11] also utilize the
product forecasting, no systematic approach has thus far been potential market size as inputs for the subjective algebraic
proposed in the literature. procedures rather than as outputs of the procedure, because the
market size is likely to be affected by marketing efforts and
3. Methodology various environmental factors rather than by product charac-
teristics [34]. Consequently, our prediction model also only
3.1. Overall procedure includes estimates of p and q. These two parameters of the 87
products were estimated using the NLS procedure, with average
The present study proposes a statistical and machine learning values of p and q found to be 0.0087 and 0.3273, respectively.
approach to the pre-launch forecasting of new product demand The parameter estimates included in the product diffusion DB
on the basis of the Bass model. Fig. 1 depicts the overall were then employed as the target variables in the prediction
procedure of the proposed approach. Two types of DBs are models.
required to develop the prediction models: a product diffusion The next step was to construct the product attribute DB for
DB and a product attribute DB. The product diffusion DB includes the 87 products. Firstly, product attribute variables affecting
the Bass diffusion parameters of existing products obtained by the diffusion patterns need to be figured out. An extensive
applying the NLS technique to historical sales data on each literature review of the drivers and determinants of innovation
product. To construct the product attribute DB, various attributes diffusion and a series of discussions with senior marketing
that can explain the diffusion characteristics of a product need to managers in practice led us to derive 17 attribute variables for
be selected and defined. The values of the attributes of each constructing the product attribute DB. Table 1 presents these
product are measured through expert judgments. By taking the 17 variables with their abbreviations and measurement scales.
product attribute DB as inputs and the product diffusion DB as These variables can be grouped into four categories: industry,
targets, single prediction models are developed to identify the market, technology, and use. Four variables (IC, TCS, TIE, and
relationship between diffusion parameters and product attri- DN) were valued on nominal scales, while the other variables
butes. Six statistical and machine learning-based regression were measured on a five-point Likert scale from very low (1) to
algorithms are utilized. To improve prediction performance, very high (5). The values of the attribute variables for each of
ensemble prediction models combining these single models are the 87 products were then measured by industry experts
also constructed. Comparing prediction performance among the including marketing managers and engineers and used as input
developed models produces the best-fit models that will be variables in the prediction models.
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 5
3.3. Experimental design using a stepwise linear regression, we selected only the
crucial variables before training the prediction models. This
3.3.1. Data preprocessing and variable selection stepwise selection process began with the single most relevant
As explained earlier, some products were removed owing input variable, and the following two procedures were then
to the limited number of historical sales records. Another conducted alternately until all significant variables had been
issue in the modeling process was the existence of outliers identified: (1) among the candidate variables, the one that
whose estimates were noticeably different. Fig. 2 shows the most improves accuracy is added to the selected variable set,
estimated Bass model parameters of the 87 products. This from which (2) the one that is most irrelevant to improve
figure clearly shows that certain products have significantly accuracy is removed.
large p or q values compared with others. Since such outliers
would degrade prediction accuracy, seven products whose
estimates were beyond two standard deviations from the 3.3.2. Regression algorithms
mean were also removed. Consequently, 80 products were Six statistical and machine learning-based regression algo-
finally used to build the prediction models, with the average rithms were employed to build single prediction models, namely
estimates of p and q decreasing to 0.0063 and 0.2783, multivariate linear regression (MLR), k-nearest neighbor regres-
respectively. The parameter estimates of each of these 80 sion (k-NN), artificial neural network (ANN), support vector
products are presented in Appendix A. regression (SVR), classification and regression tree (CART), and
Because most of the regression algorithms employed in Gaussian process regression (GPR).
this study can only handle numerical variables, the four MLR [35] fits the functional relationship between multiple
nominal variables (IC, TCS, TIE, and DN) were transformed input variables and the target variable of the given data in the
into binary variables using the 1-of-C coding method. In this form of a linear equation. Let yi denote the target value (p or
method, C binary dummy variables are created for a nominal q) of the ith product, while xij denotes the jth input variable of
variable with C categories; for each dummy variable, 1 is the ith product. Then, the MLR equation of d predictors with n
assigned if the original value falls in the same category, with training instances can be written as
0 otherwise (see Fig. 3). Once the variable transformation
had been completed, the number of variables increased from
17 to 24.
Some of these 24 candidate input variables were dispens-
able because they had little effect on the prediction. Thus, yi ¼ β0 þ β1 xi1 þ β2 xi2 þ ⋯ þ βd xid ; for i ¼ 1; 2; ⋯; n: ð6Þ
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
6 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
Table 1
Product attribute variables.
0 1 0 1 0 1
y1 1 x11 ⋯ x1d β1 k-NN [36] is the most popular case-based reasoning algo-
y ¼ Xβ; y¼ @ ⋮ A; X¼ @⋮ ⋮ ⋮ ⋮ A; β¼ @ ⋮ A:ð7Þ
rithm. Since it does not require a separate training procedure,
yn 1 xn1 ⋯ xnd βd
it has been employed in various domains where rapid and
frequent model updates are required [37–39]. k-NN predicts
a new instance on the basis of the similarity to its neighbors.
Once a test instance xt is given, k-NN first searches the k most
The regression coefficients β can be obtained by mini-
similar instances in the reference data set using a certain
mizing the squared residual error between the target (y) and
distance metric and allocates weights to them under the
b), as shown in Eq. (8), using the ordinary least
predictions (y
principle that the greater the similarity, the greater is the
squares method:
weight. Then, the prediction is made according to the weighted
average of the target values and assigned weights of the selected
neighbors as follows:
1X n
2 1 T 1
E¼ e ¼ y−b y y−y b ¼ ðy−XβÞT ðy−XβÞ:
2 i¼1 i 2 2 X
−1 ð8Þ b
∂E T T T T
yt ¼ w jy j; ð9Þ
¼ X y−X Xβ ¼ 0; β ¼ X X X y: j∈NNðxt Þ
∂β
Fig. 2. The estimated Bass model parameters (x-axis: product index, y-axis: estimated value).
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 7
where NN(xt) and wj denote the index set of the k-nearest C in the Eq. (12) controls the trade-off between the flatness
neighbors of xt and the assigned weight to the jth nearest and the error of the training samples outside the ε-tube.
neighbor, respectively. In the k-NN regression, two user-specific Lagrangian formulation is derived by eliminating the con-
parameters must be declared prior to the prediction: the number straints using slack variables as follows:
of nearest neighbors (k) and weight allocation method.
ANN [21] is one of the most widely used nonparametric
Xn X
n
regression algorithms in many application domains owing 1 2 T
min kwk þ C ζi þ ζi − α i ε þ ζ i −yi þ w xi þ b
to its ability to capture nonlinear relationships between the w 2 i¼1 i¼1
input and output variables [40–42]. A three-layer feed-forward Xn X n X n
T
neural network is used in our experiments. In ANN, the target is − ηi ζ i − α i ε þ ζ i þ yi −w xi −b − ηi ζ i ;
i¼1 i¼1 i¼1
expressed as a combination of input values, activation functions, s:t:
α i ; α i ; ζ i ; ζ i ≥0:
and weights as follows:
ð13Þ
!
X
h
ð2Þ
X
d
ð1Þ By taking the derivatives of the primal variables, the optimal
yi ¼ wq g wqp xip ; for i ¼ 1; 2; ⋯; n; ð10Þ
q¼1 p¼1 conditions for the above Lagrangian are obtained; in turn,
Wolfe's dual problem is derived by replacing the conditions in
the primal problem:
pth input node to the qth hidden node, and the activation X
n
s:t: α i −α i ¼ 0; 0 ≤α i ; α i ≤1:
function, respectively. Training ANN is equivalent to finding the i¼1
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
8 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
ability if the model memorized all the characteristics of data, where A = σ12 XXT þ Σ−1 p . The prediction distribution of ft
some of which are even unnecessary, e.g., noises or outliers. at a test instance xt is then obtained by averaging the output
The role of pruning is to merge adjacent leaf nodes if the of all possible linear models with regard to the Gaussian
prediction accuracy is improved or at least unchanged on the posterior
validation data set that is not used in the tree construction.
Once pruning has been completed, CART predicts a new Z
instance by averaging the target values in the leaf node to pð f t j xt ; X; yÞ ¼ f ðxt jwÞP ðwjX; yÞdw ð20Þ
which the new instance belongs.
1 T −1 T −1
GPR [48] begins with the Bayesian approach to MLR and ¼N x A Xy; xt A xt :
2 t
σ
extends the expressiveness by adopting kernel tricks. In GPR,
the target y is expressed as a linear combination of the inputs
with a Gaussian noise as follows:
As in SVR, GPR can fit a nonlinear relationship by
T
introducing a mapping function ϕ(x) to project the data
y ¼ f ðxÞ þ ε; f ðxÞ ¼ x w; ð15Þ from a low dimensional space to a higher dimensional
feature space and using kernel tricks to compute inner
products in the feature space without an explicit form of
assuming that the noise follows an independent, identically ϕ(x).
distributed (i.i.d.) Gaussian distribution with zero mean and
variance σ2, 3.3.3. Ensemble model
After training the single prediction models, ensemble
2
ε∼N 0; σ : ð16Þ prediction models [21,49] are constructed in order to enhance
the predictive power. Fig. 4 shows a general structure of an
ensemble model. A number of variations are possible according
The likelihood, which is the probability density of the to the diversity of algorithms or parameters; experts can consist
given data and parameters, can be directly obtained as of different learning algorithms, an identical learning algorithm
with different parameters, or a combination of the both. The
0 2 1 prediction outcome of an ensemble is formulated by aggregat-
n n
1 B yi −xTi w C ing the output of every expert. In our experiments, the best
pðyjX; wÞ ¼ ∏ pðyi jxi ; wÞ ¼ ∏ pffiffiffiffiffiffi exp@− A ensemble model was identified among all possible combinations
i¼1 i¼1 2π σ 2σ 2
!
of regression algorithms and subsequently used in a comparative
1
1 T 2 validation analysis as well as in the case study.
¼ exp − j y−X
T
wj
2
¼ N X w; σ I
2 n=2 2σ 2
2πσ
ð17Þ 3.3.4. Validation method and performance measures
As a benchmark method to verify our proposed prediction
models, a conventional analogical prediction model was also
As a prior distribution over the weights, a zero mean constructed. In the analogical model, the two parameters of
Gaussian with covariance matrix Σp is generally used the Bass model can be predicted as follows:
X
m X
n
w∼N 0; Σp : ð18Þ p¼ wi xij p j ð21Þ
j¼1 i¼1
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 9
Table 2
The selected input variables for p and q by stepwise linear regression (α =0.05, the variables with bold face are selected for both p and q).
p q
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
10 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
Fig. 5. The distribution and the box plot of the prediction residuals for each single prediction model.
generalization is guaranteed only when a sufficient number of 4.3. Forecasting performances: ensemble model
training instances are provided. This requirement is not likely
to be met by having 80 products only, and therefore, ANN and Of the 57 possible combinations of regression algorithms for
CART result in lower prediction accuracies. constructing ensemble models, the union of MLR and GPR was
Turning to the comparison of prediction performance found to be most accurate. Table 4 compares the prediction
between the statistical and machine learning-based algorithms performances of the best ensemble model with those of the
and the conventional analogical method, the former are superior analogical method and MLR. Constructing an ensemble model
to the latter with only a few exceptions, namely CART for p and resulted in the prediction errors deceasing greatly for both p and
ANN for q. Setting these exceptions aside, the analogical method q; almost 90% of the MAE and RMSE of the analogical method
is at least 47% worse than the others at predicting p and at least disappeared in both cases. Compared with the best single model
8.7% worse at predicting q. Compared with the best single (i.e., MLR), the ensemble prediction model still provided a
algorithm, the MAE of the analogical method is almost twice that surprising improvement; over 75% of prediction errors were
of MLR for both p and q. In other words, in the best-case scenario, reduced for both p and q regardless of the performance
the new prediction methodology proposed in this study has a measures.
twice as strong predictive power compared with the conven- Fig. 6 depicts the estimated Bass model parameters
tional analogical method. (target, x-axis) and prediction outcomes (y-axis) derived by
Fig. 5 shows the distribution and box plot of the residuals the three models: the analogical method, the best single
( y−b y ) produced by each single prediction model. A good model (i.e., MLR), and the best ensemble model (i.e., MLR and
prediction model should meet the following two qualifica- GPR). The straight line in the figures represents the ideal
tions: (1) the average of the residuals should be as close to cases where predicted outcomes equal to their actual targets.
zero as possible and (2) the dispersion of the residuals should Thus, the closer the points approach the line, the better is the
be as narrow as possible. Given these conditions, we find that prediction model. In this respect, the analogical method is the
the conventional analogical method is the poorest model
because not only does it have the widest residual dispersion
but also its average is below zero for both p and q. As
previously noted, ANN and CART are inferior to the other
statistical and machine learning-based models since their
Table 4
dispersions are almost as wide as that of the analogical The prediction performance of the ensemble model compared to the
method. This confirms that MLR produces the most desirable analogical method and the MLR (The bold face numbers denote the lowest
outcomes in that its prediction residuals are most narrowly error among the algorithms).
distributed with an average value close to zero. However, the Algorithm p q
box plot implies that k-NN, SVR, and GPR seem better than
MLR at predicting q, because their inter-quantile ranges are MAE RMSE MAE RMSE
smaller than that of MLR. However, their MAEs and RMSEs MLR 0.0026 0.0031 0.0688 0.0898
are spoiled by a few products, for which they fail to make Ensemble 0.0006 0.0007 0.0180 0.0223
Analogy 0.0053 0.0073 0.1337 0.1907
proper predictions, in contrast to MLR.
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 11
most inferior of the three models because the points in the problems caused by MLR: there are no over-predicted values
Fig. 6(a) and (b) seem to be almost randomly distributed. In and negative outcomes appear in only two products when
addition, its prediction coverage is much narrower than that of predicting p.
the actual targets; its predicted p values are generally located The experimental results in this study can be summarized as
within 0 and 0.01, although their target values are in the range follows. First, the product attributes configured herein are shown
of 0 to 0.03. Moreover, its predicted q values are mostly located to be valid, as they lead the regression algorithms to accurate
within 0 and 0.3 even though their target values are in the prediction results. Second, statistical and machine learning-
range of 0 to 0.7. It seems as though the randomness and based regression algorithms result in a higher predictive power
narrow coverage of these predicted outcomes can be resolved compared with the conventional analogical model. Among
using MLR (Fig. 6(c) and (d)). However, two other issues arise the regression algorithms, the best prediction model was MLR;
from the MLR results: (1) MLR tends to over-predict the not only because its prediction error rate was the lowest, but
estimates beyond a certain extent (p N 0.02 and q N 0.05) and also because its residual errors were most compactly distrib-
(2) the prediction outcomes of some products are negative. uted. Lastly, the ensemble model significantly enhanced the
Fig. 6(e) and (f) shows that the ensemble model can overcome prediction accuracy of the single regression algorithms. In
Fig. 6. The predicted against target values of the analogical method, the best single model (MLR), and the ensemble model.
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
12 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
addition, it resolved the practical issues caused by the other scope to the North American market. Although previous
approaches, such as low correlation between the actual and studies have surveyed consumers' purchase intentions to
predicted estimates, over-prediction beyond a certain extent, estimate the potential market size, this study rather uses the
and negative outcome generation. total number of households in North America as a proxy for
the potential market size because this example case aims to
5. Illustrative example: demand forecasting of 3D TV illustrate how to apply the developed prediction models to
estimate the parameters of the two types of communication
For the purpose of illustration, demand for 3D TV in the effects.
North American market is forecasted in this section. 3D TV is a Demand for 3D TV in North America for 15 years (from
next-generation display that conveys depth perception to the 2010 to 2024) was then forecasted by combining the total
viewer by employing techniques such as stereoscopic display, number of the North American households in 2010 (128 mil-
multi-view display, 2D-plus-depth, or any other form of 3D lion) by the two predicted parameters. Fig. 7(a) and (b)
display. As interest in 3D, which started with the movie Avatar, depicts the annual and cumulative demand patterns derived
has exploded, major TV manufacturers such as Samsung, LG, from the ensemble model. Comparing the forecasts with real
Sony, and Panasonic have aimed to steal market share by sales data in first years may uphold the validity of the
launching various types of 3D TVs. Although Samsung launched prediction models. 3D TV shipments in the North American
the first commercial 3D TV in 2008, 2010 was considered to be market totaled 8.4 million units in 2012, from 4.5 million
the breakthrough year of the technology, as over six million 3D units in 2011. The forecast cumulative sales in the ensemble
TVs were sold in 2010 [51]. Because only three years of data model are 4.2 million in 2011 and 8.5 million in 2012, which
points are available at this stage, reliable parameter estimates are very similar to reals sales, although the annual sales in
of the Bass model cannot be produced using the NLS estimation. 2010 are somewhat overestimated.
We therefore apply the developed prediction models to forecast
3D TV demand.
6. Conclusions
First, the attribute values of 3D TV for the seven significant
variables were measured by expert judgments. These experts
This study proposed a statistical and machine learning
included engineers and marketing managers of one of the
algorithm-based approach to pre-launch product demand
leading display manufacturers as well as professors majoring
forecasting on the basis of the Bass model. Taking the product
in electronic engineering, particularly display devices. The
attribute DB as inputs and product diffusion DB as outputs,
measured values of 3D TV are presented in Table 5. The best
single prediction models were developed using the six
single regression model (MLR) and best ensemble model
regression algorithms, on the basis of which an ensemble
(MLR and GPR) were then employed to predict the param-
prediction model was constructed to enhance predictive
eters of 3D TV.
power. It was shown that most single prediction models
Inputting the obtained attribute values of 3D TV into the
outperformed the conventional analogical method and that
two models produced two sets of Bass diffusion parameters,
the ensemble model improved prediction accuracy further.
as shown in Table 6. The estimates obtained from the MLR
An illustrative example of 3D TV was also provided to
model were found to be slightly higher than those from the
demonstrate how the developed models could be used in
ensemble model for both p and q. Once p and q had been
practice.
obtained, important points in the product lifecycle, such as
This study contributes to the field of pre-launch forecast-
takeoff (T1) and peak (T*), could then also be straightfor-
ing by proposing a new approach that utilizes statistical and
wardly predicted [52]. Times to takeoff and peak in the MLR
machine learning-based regression algorithms. Despite the
model were found to slightly precede those in the ensemble
importance of the pre-launch forecasting of new products,
model.
conventional approaches such as subjective and analogical
Next, potential market size (m) was estimated in order to
methods fail to produce objective estimates of diffusion
forecast annual demand. This illustrative example limits its
parameters. However, adopting statistical and machine
learning-based regression algorithms can reliably portray
the relationship between the attributes and diffusion charac-
teristics of existing products, which, in turn, enables forecasting
Table 5
Attribute values of 3D TV.
Variable Value
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 13
Fig. 7. Demand forecasts of 3D TV in the North American market from the ensemble model.
new product demand solely on the basis of product attributes product DB, the higher is forecasting accuracy. The product DB
(i.e., without human manipulation) and fosters effective pre- should also be regularly updated. Also, the variables selected as
launch decision-making. product attributes are by no means exhaustive or fixed. Although
However, the prerequisite for benefiting from the pro- this paper employed 17 variables, they may not be sufficient to
posed approach is maintaining a relevant and sufficient DB explain the idiosyncratic characteristics of the diffusion of
of existing products. Although the primary purpose of this various products. The significant variables can also vary
study was to propose a new approach, the data used for depending on the country or industry context. Future explor-
constructing the prediction models herein may not be atory studies may help identify the context-specific factors that
enough to take advantage of the statistical and machine affect the diffusion of products.
learning algorithms. In addition, as we accumulated as many
products as possible in the product DB, the nations and
industries into which the products were diffused varied, Acknowledgments
implying that the validity of the parameter estimates may be
questionable. This work was supported by the Korea Institute of Science
Moreover, the performance of the proposed approach relies and Technology Information (KISTI) and the National Research
heavily on the product DB used to construct the prediction Foundation of Korea (NRF) grants funded by the Korea govern-
models. The more homogeneous products included in this ment (MEST) (No. 2011-0012759 and 2011-0021893).
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
14 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
Appendix A. The Bass model parameters estimated by the NLS and predicted by the MLR, ensemble, and analogy
Product p q
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx 15
Appendix
(continued)
A (continued)
Product p q
Family radio devices 0.015335 0.022861 0.015506 0.001505 0.380731 0.332642 0.376496 0.068652
Personal computers 0.015950 0.022743 0.014494 0.011918 0.179772 0.078699 0.164870 0.127937
Portable tape and radio/tape players 0.017427 0.022506 0.017595 0.004357 0.232762 0.184680 0.258116 0.217054
LCD TV (Digital and analog) 0.018607 0.023682 0.019608 0.004492 0.245743 0.126912 0.269804 0.267157
Videocassette players 0.019732 0.025518 0.018246 0.004515 0.285609 0.402980 0.296578 0.247694
MP3 0.019738 0.028225 0.018932 0.005578 0.697928 0.925170 0.670885 0.242889
Personal word processors 0.020585 0.023438 0.021476 0.001072 0.215886 0.298904 0.205950 0.282171
LCD monitor 0.021403 0.032172 0.021779 0.004330 0.612865 1.015626 0.581345 0.245977
Analog handheld LCD monochrome TV 0.021752 0.026762 0.021560 0.008903 0.163545 0.046244 0.180689 0.155856
Electronic calculator 0.023836 0.035584 0.023161 0.007975 0.255873 0.378878 0.233736 0.089886
Facsimile 0.024187 0.034919 0.024426 0.002648 0.264186 0.212156 0.253331 0.201981
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020
16 H. Lee et al. / Technological Forecasting & Social Change xxx (2013) xxx–xxx
[51] Displaybank, 3D TV Industry Trend and Market Forecast, Speical Report Hyun-woo Park is a technology economist at the Korea Institute of Science
May 2010, Gyeonggi-do, Korea, 2010. and Technology Information (KISTI). He worked with San Francisco State
[52] V. Mahajan, E. Muller, R.K. Srivastava, Determination of adopter categories University as a Visiting Fellow (1996–1997) and University of California,
by using innovation diffusion models, J. Mark. Res. 27 (1990) 37–50. Santa Cruz as a Research Scholar (2008–2009). He received the B.S., M.S,
and Ph.D. in International Business (1991) from Hong-Ik University,
and Ph.D. in Science and Technology Studies (2007) from Korea University.
Hakyeon Lee is an assistant professor in the Dept. of Industrial and Information
He published many articles and books in the research fields including
Systems Engineering, Seoul National University of Science and Technology. He
technology valuation, innovation management, technology commercialization,
received the degrees of B.S. and Ph.D. from Seoul National University. His research
and scientometrics. He has authored a number of papers on related topics in
interests are in technological forecasting and innovation diffusion. He has
leading journals such as Scientometrics, Asian Journal of Technology Innovation,
authored several published papers in leading journals of technology and
and Journal of Supply Chain and Operations Management.
innovation management, including Technological Forecasting and Social Change,
Research Policy, Technology Analysis & Strategic Management, Journal of Engineering
and Technology Management, and Technology in Society.
Sang Gook Kim is a senior researcher of the Industry Information Analysis Pilsung Kang is an assistant professor in the Dept. of Industrial and
Center, Korea Institute of Science and Technology Information. He received the Information Systems Engineering, Seoul National University of Science
degree of BS in Industrial Engineering at Seoul National University and the Ph.D. in and Technology. He received B.S. and Ph.D. in Industrial Engineering at
Operations Research at Florida Institute of Technology. His recent research Seoul National University. His research interests include instance-based
interests are in technology valuation, product lifecycle forecasting, and technology learning, learning kernel machines, novelty detection, learning algorithms in
commercialization and main study areas also include queuing models based on class imbalance, and network analysis. His research also includes application
stochastic process and optimization theories. He has authored a few papers on areas such as keystroke dynamics-based authentication, fault detection in
related topics in leading journals such as Nonlinear Analysis: Theory, Methods & manufacturing process, and technological forecasting. He has published a
Applications, Journal of Supply Chain and Operations Management, and Asian Journal number of papers on related topics in leading journals such as Pattern
of Innovation and Policy. Recognition, Intelligent Data Analysis, and Neurocomputing.
Please cite this article as: H. Lee, et al., Pre-launch new product demand forecasting using the Bass model: A statistical and machine
learning-based approach, Technol. Forecast. Soc. Change (2013), http://dx.doi.org/10.1016/j.techfore.2013.08.020