Exponential Smmothing

International Journal of Forecasting 22 (2006) 637 – 666
www.elsevier.com/locate/ijforecast
Exponential smoothing: The state of the art—Part II

Everette S. Gardner Jr. *
Bauer College of Business, 334 Melcher Hall, University of Houston, Houston, TX 77204-6021, United States
Abstract
In Gardner [Gardner, E. S., Jr. (1985). Exponential smoothing: The state of the art. Journal of Forecasting, 4, 1–28], I
reviewed the research in exponential smoothing since the original work by Brown and Holt. This paper brings the state of the art
up to date. The most important theoretical advance is the invention of a complete statistical rationale for exponential smoothing
based on a new class of state-space models with a single source of error. The most important practical advance is the
development of a robust method for smoothing damped multiplicative trends. We also have a new adaptive method for simple
smoothing, the first such method to demonstrate credible improved forecast accuracy over fixed-parameter smoothing.
Longstanding confusion in the literature about whether and how to renormalize seasonal indices in the Holt–Winters methods
has finally been resolved. There has been significant work in forecasting for inventory control, including the development of
new predictive distributions for total lead-time demand and several improved versions of Croston’s method for forecasting
intermittent time series. Regrettably, there has been little progress in the identification and selection of exponential smoothing
methods. The research in this area is best described as inconclusive, and it is still difficult to beat the application of a damped
trend to every time series.
D 2006 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Keywords: Time series—ARIMA, exponential smoothing, state-space models, identification, stability, invertibility, model selection;
Comparative methods—evaluation; Intermittent demand; Inventory control; Prediction intervals; Regression—discount weighted, kernel
1. Introduction been turned on its head, and today we know that

exponential smoothing methods are optimal for a very
When Gardner (1985) appeared, many believed general class of state-space models that is in fact
that exponential smoothing should be disregarded broader than the ARIMA class.
because it was either a special case of ARIMA This paper brings the state of the art in exponential
modeling or an ad hoc procedure with no statistical smoothing up to date with a critical review of the
rationale. Since 1985, the special case argument has research since 1985. Prior research findings are
included where necessary to provide continuity and
context. The plan of the paper is as follows. Section 2
* Tel.: +1 713 743 4744; fax: +1 713 743 4940. summarizes new information that has come to light on
E-mail address: [email protected]. the early history of exponential smoothing. Section 3
0169-2070/$ - see front matter D 2006 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.ijforecast.2006.03.005
638 E.S. Gardner Jr. / International Journal of Forecasting 22 (2006) 637–666
gives formulations for the standard Holt–Winters applications was in forecasting the demand for spare
methods and a number of variations and extensions parts in Navy inventory systems. The savings in data
to create equivalences to state-space models, nor- storage over moving averages led to the adoption of
malize seasonals, and cope with problems such as exponential smoothing throughout Navy inventory
series with a fixed drift, missing observations, systems during the 1950s. In 1956, Brown presented
irregular updates, planned discontinuities, multiple his work on exponential smoothing of inventory
seasonal cycles (in the same series), and multivar- demands at a conference of the Operations Research
iate series. Equivalent regression, ARIMA, and Society of America. This presentation formed the
state-space models are reviewed in Section 4. This basis of Brown’s first book, Statistical Forecasting for
section also discusses variances, prediction intervals, Inventory Control (Brown, 1959). His second book,
and some possible explanations for the robustness Smoothing, Forecasting, and Prediction of Discrete
of exponential smoothing. Procedures for method Time Series (Brown, 1963), developed the general
and model selection are discussed in Section 5, exponential smoothing methodology. In numerous
including the use of time series characteristics, later books, Brown integrated exponential smoothing
expert systems, information criteria, and operational with inventory management and production planning
measures. Section 6 reviews the details of model- and control.
fitting, including the selection of parameters, initial During the 1950s, Charles C. Holt, with support
values, and loss functions. In Section 6, we also from the Logistics Branch of the Office of Naval
discuss the use of adaptive parameters to avoid Research (ONR), worked independently of Brown to
model-fitting. Applications of exponential smooth- develop a similar method for exponential smoothing
ing to inventory control are discussed in Section 7. of additive trends and an entirely different method
Section 8 summarizes the many empirical studies in for smoothing seasonal data. Holt’s original work
which exponential smoothing has been used. Con- was documented in an ONR memorandum (Holt,
clusions and an assessment of the state of the art 1957) that went unpublished until recently (Holt,
are offered in Section 9. This plan does not include 2004a, 2004b). However, Holt’s ideas gained wide
coverage of tracking signals, a subject that has publicity in 1960. In a landmark article, Winters
disappeared from the literature since the earlier (1960) tested Holt’s methods with empirical data,
paper. and they became known as the Holt–Winters
forecasting system. Another landmark article by
Muth (1960) was among the first to examine the
2. Early history of exponential smoothing optimal properties of exponential smoothing fore-
casts. Holt’s methods of exponential smoothing were
Exponential smoothing originated in Robert G. also featured in the classic text by Holt, Modigliani,
Brown’s work as an OR analyst for the US Navy Muth, and Simon (1960), Planning Production,
during World War II (Gass & Harris, 2000). In 1944, Inventories, and Work Force, a book that is still
Brown was assigned to the antisubmarine effort and in use today in doctoral programs in operations
given the job of developing a tracking model for fire- management.
control information on the location of submarines.
This information was used in a mechanical computing
device, a ball-disk integrator, to estimate target 3. Formulation of exponential smoothing methods
velocity and the lead angle for firing depth charges
from destroyers. Brown’s tracking model was essen- Section 3.1 classifies and gives formulations for
tially simple exponential smoothing of continuous the standard methods of exponential smoothing.
data, an idea still used in modern fire-control These methods can be modified to create state-space
equipment. models as discussed in Section 3.2. Seasonal indices
During the early 1950s, Brown extended simple are not automatically renormalized in either the
exponential smoothing to discrete data and developed standard or state-space versions of exponential
methods for trends and seasonality. One of his early smoothing, and procedures for renormalization are
E.S. Gardner Jr. / International Journal of Forecasting 22 (2006) 637–666 639
reviewed in Section 3.3. In Section 3.4, we collect a because the local slope is estimated by smoothing
number of variations on the standard methods to cope successive differences of the local level. In contrast,
with special kinds of time series. Pegels’ multiplicative trends (M-N, M-A, and M-M)
estimate the local growth rate by smoothing succes-
3.1. Standard methods sive ratios of the local level. In hopes of producing
more robust forecasts, Taylor’s methods (DM-N,
Table 1 contains equations for the standard DM-A, and DM-M) add a damping parameter / b 1
methods of exponential smoothing, all of which are to Pegels’ multiplicative trends.
extensions of the work of Brown (1959, 1963), Holt Although many new models underlying exponen-
(1957), and Winters (1960). For each type of trend, tial smoothing have been proposed since 1985, the
there are two sections of equations: the first gives damped multiplicative trends are the only new
recurrence forms and the second gives equivalent methods in the sense that they create new forecast
error-correction forms. Recurrence forms were used in profiles. Like the damped additive trends, the
the original work by Brown and Holt and are still forecast profiles for Taylor’s methods will eventually
widely used in practice, but error-correction forms are approach a horizontal nonseasonal or seasonally
simpler. The notation is from Gardner (1985) and is adjusted asymptote. However, in the near term,
defined in Table 2. different values of / can be used to produce forecast
The taxonomy of Hyndman, Koehler, Snyder, and profiles that are convex, nearly linear, or even
Grose (2002), as extended by Taylor (2003a), is concave.
helpful in describing the methods. Each method is
denoted by one or two letters for the trend (row 3.2. State-space equivalent methods
heading) and one letter for seasonality (column
heading). Method N-N denotes no trend with no There are many equivalent state-space models for
seasonality, or simple exponential smoothing (Brown, each of the methods in Table 1. Here, we review the
1959). The other nonseasonal methods are Holt’s particular modeling framework of Hyndman et al.
(1957) additive trend (A-N), Gardner and McKenzie’s (2002) that includes all methods in Table 1 except the
(1985) damped additive trend (DA-N), Pegels’ (1969) DM methods. In this framework, each exponential
multiplicative trend (M-N), and Taylor’s (2003a) smoothing method has two corresponding state-space
damped multiplicative trend (DM-N). All seasonal models, each with a single source of error (SSOE). One
methods are formulated by extending the methods in model has an additive error and the other has a
Winters (1960). Note that the forecast equations for multiplicative error. As discussed in Section 4.3, if
the seasonal methods are valid only for a forecast the parameters are the same, the two models give the
horizon (m) less than or equal to the length of the same point forecasts but different variances. The
seasonal cycle ( p). methods corresponding to the Hyndman et al.
There are several differences between Table 1 and framework are the same as those in Table 1 with
the tables of equations in Gardner (1985). First, the DA two exceptions: we must modify all multiplicative
methods are given in recurrence forms that were not seasonal methods and all damped additive trend (DA)
included in the earlier paper. Second, the seasonal DA methods.
methods were formulated with three parameters in the We proceed as follows to modify the multiplicative
earlier paper, but the same methods in Table 1 contain seasonal methods. In the N-M standard equations for
four parameters as developed in Gardner and McKen- updating the multiplicative seasonal component I t ,
zie (1989). Finally, the DM methods are new. replace the smoothed level S t with S t1. This change
The DA-N method can be used to forecast is made in both recurrence and error-correction forms.
multiplicative trends with the autoregressive or In the A-M, DA-M, and M-M standard equations for
damping parameter / restricted to the range updating I t , replace S t with S t1 + T t1, where T t-1 is
1 b / b 2, a method sometimes called bgeneralized the previous smoothed trend, again in both recurrence
Holt.Q As Taylor (2003a) observed, generalized Holt and error-correction forms. One precedent for these
is a clumsy way to model a multiplicative trend modifications is found in Williams (1987), who shows
Table 1
Standard exponential smoothing methods
Trend Seasonality
N (None) A (Additive) M (Multiplicative)
N (None) S t = aX t + (1 a)S t1 S t = a(X t I tp ) + (1 a)S t1 S t = a(X t /I t p ) + (1 a)S t1
X̂ t (m) = S t I t = d(X t S t ) + (1 d)I tp I t = d(X t /S t ) + (1 d)I tp
X̂ t (m) = S t + I tp+m X̂ t (m) = S t I t p+m
S t =S t 1 + ae t S t = S t1 + ae t S t = S t 1 + ae t /I t p
X̂ t (m)=S t I t = I tp + d(1 a)e t I t = I tp + d(1 a)e t /S t
X̂ t (m) = S t + I tp+m X̂ t (m) = S t I t p+m
A (Additive) S t = aX t + (1 a)(S t1 + T t1) S t = a(X t I tp ) + (1 a)(S t1 + T t1) S t = a(X t /I t p ) + (1 a)(S t 1 + T t 1)
T t = c(S t S t1) + (1 c)T t 1 T t = c(S t S t1) + (1 c)T t 1 T t = c(S t S t1) + (1 c)T t 1
X̂ t (m) = S t + mT t I t = d(X t S t ) + (1 d)I tp I t = d(X t /S t ) + (1 d)I tp
X̂ t (m) = S t + mT t + I tp+m X̂ t (m) = (S t + mT t )I tp+m
S t = S t1 + T t1 + ae t S t = S t1 + T t1 + ae t S t = S t 1 + T t1 + ae t /I tp

T t = T t1 + ace t T t = T t1 + ace t T t = T t1 + ace t /I t p
X̂ t (m) = S t + mT t I t = I tp + d(1 a)e t I t = I tp + d(1 a)e t /S t
X̂ t (m) = S t + mT t + I tp+m X̂ t (m) = (S t + mT t )I tp+m
DA (Damped additive) S t = aX t + (1 a)(S t1 + /T t 1) S t = a(X t I tp ) + (1 a)(S t1 + /T t 1) S t = a(X t /I t p ) + (1 a)(S t 1 + /T t1)
T t = c(S t S t1) + (1 c)/T t1 T t = c(S t S t1) + (1 c)/T t1 T t = c(S t S t1) + (1 c)/T t1
X m I t = d(X t S t ) + (1 d)I tp I t = d(X t /S t ) + (1 d)I tp
!
Xt ðmÞ ¼ St þ
X̂ /i Tt Xm
Xm
i¼1 Xt ðmÞ ¼ St þ
X̂ /i Tt þ Itpþm Xt ðmÞ ¼ St þ
X̂ /i Tt Itpþm
i¼1 i¼1
S t = S t1 + /T t 1 + ae t S t = S t1 + /T t 1 + ae t S t = S t 1 + /T t1 + ae t /I tp

T t = /T t1 + ace t T t = /T t 1 + ace t T t = /T t 1 + ace t /I tp
Xm I t = I tp + d(1 a)e t I t = I tp + d(1 a)e t /S t
!
Xt ðmÞ ¼ St þ
X̂ /i Tt Xm
X m
i¼1 Xt ðmÞ ¼ St þ
X̂ /i Tt þ Itpþm Xt ðmÞ ¼ St þ
X̂ /i Tt Itpþm
i¼1 i¼1
M (Multiplicative) S t = aX t + (1 a)(S t1R t1) S t = a(X t I tp ) + (1 a)S t1R t 1 S t = a(X t /I t p ) + (1 a)S t1R t1
R t = c(S t /S t1) + (1 c)R t1 R t = c(S t /S t 1) + (1 c)R t 1 R t = c(S t /S t 1) + (1 c)R t 1
X̂ t (m) = S t Rmt I t = d(X t S t ) + (1 d)I tp I t = d(X t /S t ) + (1 d)I tp
X̂ t (m) = S t Rmt + I t p+m X̂ t (m) = (S t Rmt )I t p+m
S t = S t1R t1 + ae t S t = S t1R t1 + ae t S t = S t 1R t1 + ae t /I tp

R t = R t1 + ace t /S t1 R t = R t 1 + ace t /S t1 R t = R t 1 + (ace t /S t1)/I t p
X̂ t (m) = S t Rmt I t = I tp + d(1 a)e t I t = I tp + d(1 a)e t /S t
X̂ t (m) = S t Rmt + I t p+m X̂ t (m) = (S t Rmt )I t p+m
/
DM (Damped multiplicative) S t = aX t + (1 a)(S t1R t1 ) S t = a(X t I tp ) + (1 a)S t1R t/1 S t = a(X t /I t p ) + (1 a)(S t 1R/t1 )
/
R t = c(S t /S t1) + (1 c)R t1 R t = c(S t /S t 1) + (1 c)R t/1 R t = c(S t /S t 1) + (1 c)R/t 1
Pm i I t = d(X t S t )P
+ (1 d)I tp I t = d(X t /S d)I t1
/ t ) + (1P
X t ðmÞ ¼ St Rt i¼1
X̂ m
/i
m
/i
X t ðmÞ ¼ St Rt
X̂ i¼1
þ Itpþm X t ðmÞ ¼
X̂ St Rt i¼1
Itpþm
/ / /
S t = S t1R t1 + ae t S t = S t1R t1 + ae t S t = S t 1R t1 + ae t /I tp
/ / /
R t = R t1 + age t /S t1 R t = R t 1 + ace t /S t 1 R t = R t 1 + (ace t /S t1)/I tp
I t = I tp + d(1 a)e t I t = I tp + d(1 a)eP t /S t
Pm Pm i m
/i
/i /
X t ðmÞ ¼
X̂ St Rt i¼1
Itpþm
X t ðmÞ ¼ St Rt
X̂ i¼1 X t ðmÞ ¼ St Rt i¼1 þ Itpþm
X̂
For each type of trend, there are two sections of equations: the first gives recurrence forms and the second gives equivalent error-correction
forms.
Table 2
Notation for exponential smoothing
Symbol Definition
a Smoothing parameter for the level of the series
c Smoothing parameter for the trend
d Smoothing parameter for seasonal indices
/ Autoregressive or damping parameter
b Discount factor, 0 V b V 1
St Smoothed level of the series, computed after X t is observed. Also the expected value of the
data at the end of period t in some models
Tt Smoothed additive trend at the end of period t
Rt Smoothed multiplicative trend at the end of period t
It Smoothed seasonal index at the end of period t. Can be additive or multiplicative
Xt Observed value of the time series in period t
m Number of periods in the forecast lead-time
p Number of periods in the seasonal cycle
X̂ t (m) Forecast for m periods ahead from origin t
et One-step-ahead forecast error, e t = X t Xˆ t 1(1). Note that e t (m) should be used for other
forecast origins
Ct Cumulative renormalization factor for seasonal indices. Can be additive or multiplicative
Vt Transition variable in smooth transition exponential smoothing
Dt Observed value of nonzero demand in the Croston method
Qt Observed inter-arrival time of transactions in the Croston method
Zt Smoothed nonzero demand in the Croston method
Pt Smoothed inter-arrival time in the Croston method
Yt Estimated demand per unit time in the Croston method (Z t /P t )
that they allow us to update each component previous trend is not damped in the state-space level
independently. Archibald (1990) made the same point equations, so we delete / (replace /T t1 with T t-1).
without reference to the work of Williams. Perhaps Next, the forecast equations are changed to begin
another reason to use the multiplicative seasonal damping at two steps ahead, rather than immediately as
modifications is that, as Ord (2004) observed, this in Table 1. The forecast equation in the nonseasonal
was done in Holt’s original work (1957). However, state-space equivalent method (DA-N) is:
Holt et al. (1960) and Winters (1960) discarded the
modifications and used the standard equations in !
X
m1
Table 1. X t ðmÞ ¼
X̂ St þ /i Tt ð1Þ
What are the practical consequences of adopting the i¼0
state-space versions of the multiplicative seasonal
methods? The answer to this question awaits empirical In Eq. (1), T can be interpreted as a growth rate,
study. In an analysis of the A-M method, Koehler, something that is not possible in the standard method
Snyder, and Ord (2001) show that the difference unless / = 1. If a least-squares criterion is used to find
between the two versions of the equation for updating both initial values and parameters, the standard and
the seasonal component will be small, provided that all state-space DA methods will produce the same fore-
three smoothing parameters are less than about 0.3. casts, although the estimates of T and the smoothing
However, Koehler et al. warn that negative seasonal parameter for the trend (c) will differ by a factor of /.
components can occur in the state-space version of A-
M unless the forecast errors are much less variable than 3.3. Renormalization of seasonal indices
the data.
To modify the standard DA methods (for any type of The standard seasonal methods are initialized so
seasonality), we begin with the level equations. The that the average seasonal index is 0 (additive) or 1
(multiplicative); thereafter, normalization goes astray space version. Here, we give the correction factor for
because only one seasonal index is updated each the standard A-M version in Table 1:
period. The problem of renormalization was over-
looked in Gardner (1985), and there has been much Ct ¼ Ct1 ð1 þ det =pSt Þ ð3Þ
confusion in the literature about whether it is
necessary to renormalize the seasonal indices, and, if To renormalize at any time, multiply level and trend
so, when and how this should be done. by C t and divide each seasonal index by C t . If the
To analyze renormalization in the A-A method, state-space A-M version is used, replace S t with S t-1 +
Lawton (1998) used an equivalent state-space T t1 in Eq. (3).
model. When A-A seasonal indices are not renor-
malized, Lawton found that estimates of trend are 3.4. Other variations on the standard methods
correct but level and seasonals are biased. Fortu-
nately, the errors in estimating level and seasonals When observations are missing, Wright’s (1986a,
are counter-balancing and do not impact the fore- 1986b) solution is straightforward. Missing observa-
casts. Alternative A-A renormalization equations are tions receive zero weight, while the others are
found in McKenzie (1986), Newbold (1988), and exponentially weighted according to the age of the
Roberts (1982). These authors go about renormali- observation. Wright gives modified formulas for the
zation in different ways, but their point forecasts are N-N and A-N methods that automatically adjust the
equivalent to each other and to point forecasts from weighting pattern for all observations following a gap.
the standard equations. These formulas also work for the equivalent problem
For the A-M method, renormalization equations of observations that naturally occur at irregular time
were developed by McKenzie (1986) and Roberts intervals. Wright’s procedure was extended by Cipra,
(1982), but their point forecasts differ from each Trujillo, and Rubio (1995) to seasonal methods. An
other and from the standard equations. Therefore, alternative to Wright’s procedure is given by Aldrin
Archibald and Koehler (2003) developed new A-M and Damsleth (1989), who compute optimal weights
renormalization equations that give the same point on past data using equivalent ARIMA models. It is
forecasts as the standard equations. They also not clear that the ARIMA procedure is worth the
developed analogous A-A renormalization equations trouble because the authors analyzed two time series
that are reformulations of those originally developed and got about the same results as Wright. If
by Roberts and McKenzie. Finally, Archibald and observations are missing because they have been
Koehler derived cumulative renormalization correc- combined with other observations, see Anderson
tion factors for the A-A and A-M methods. These (1994), Johnston (1993), and Walton (1994) for
correction factors are easily extended to other adjustments to the N-N smoothing parameter.
methods and should prove to be popular in practice There may be planned discontinuities in a time
because they allow the user to keep the standard series. For example, we may expect a disruption in
equations and renormalize the seasonal indices at demand following a price change or a new product
any point in time. introduction. There are three ways of dealing with
The cumulative renormalization correction factor planned discontinuities in exponential smoothing. If
C t for the A-A method is computed iteratively using a discontinuities are recurring, Carreno and Madinavei-
simple equation: tia (1990) add an index similar to a seasonal index to
the A-N method to model the effects. When the effects
Ct ¼ Ct1 þ det =p ð2Þ of discontinuities cannot be estimated from history,
judgmental adjustments to the forecasts are usually
necessary. Williams and Miller (1999) recommend
To renormalize at any time, add C t to the level and making such adjustments within the exponential
subtract it from each seasonal index. Archibald and smoothing method rather than as a second-stage
Koehler derived the cumulative renormalization cor- correction outside the method. It may be possible to
rection factor for the A-M method using the state- express planned discontinuities as a set of linear
restrictions on the forecasts from a linear exponential method, Jones and Enns et al. simply replaced the
smoothing method. If so, Rosas and Guerrero (1994) scalars with matrices. In error-correction form, the
show that one can compute weights that meet the multivariate version of N-N is then:
restrictions in the moving-average representation of
the equivalent ARIMA model. St ¼ St1 þ a et ð4Þ
The N-N method can be enhanced by adding a drift
(fixed trend) term, making the method equivalent to With k series, the dimensions of St , St1, and et are
the bTheta method of forecastingQ (Assimakopoulos & k 1, and the dimension of a is k k. Enns et al.
Nikolopoulos, 2000) that performed well in the M3 assume that the series are produced by a multivariate
competition (Makridakis & Hibon, 2000). In a random walk and estimate the parameters by a complex
mathematical tour de force, Hyndman and Billah maximum likelihood procedure. Harvey achieved a
(2003) showed that the Theta method is the same profound simplification by proving that one can
thing as simple smoothing with drift equal to half the forecast the individual series using univariate methods.
slope of a linear trend fitted to the data. Another way The univariate parameters are chosen by a grid search
to match the Theta method is to use the same drift to minimize the sum of vector products of the one-step-
choice in the A-N method with the trend parameter set ahead errors, a procedure that approximates maximum-
to zero. We do not know why the particular drift likelihood estimation. Harvey also developed multi-
choice in the Theta method or its equivalents is better variate models with trend and seasonal components.
than any other, nor is it clear when one should prefer a Again, we replace the scalars with matrices in the error-
fixed drift over a smoothed trend. correction forms of the univariate trend and seasonal
For time series containing two seasonal cycles, methods, with parameters chosen in the same way as
Taylor (2003b) adds one more seasonal component to for multivariate N-N. Pfefferman and Allon analyzed
the A-M method. The new method was applied to the multivariate A-A method and derived several
electricity demand recorded at half-hour intervals, with structural models that produce optimal forecasts.
one seasonal equation for a within-day seasonal cycle Pfefferman and Allon also presented what appears to
and another for a within-week cycle. As so often be the only empirical evidence on multivariate expo-
happens in complex time series forecasted with nential smoothing. In forecasting two bivariate time
exponential smoothing, Taylor found significant first- series of Israeli tourism data, multivariate A-A was
order autocorrelation in the residuals. Thus, he fitted an significantly more accurate than univariate A-A.
AR(1) model to remove it, estimating the AR(1)
parameter at the same time as the smoothing parame-
ters. The resulting forecasts outperformed those from 4. Properties
the standard A-M method as well as a double seasonal
ARIMA model. Each exponential smoothing method in Table 1
Rather than add a seasonal component, Snyder and corresponds to one or more stochastic models. The
Shami (2001) eliminate it from the A-A method. The possibilities include regression, ARIMA, and state-
seasonal component is incorporated into the level, space models, as discussed in Sections 4.1–4.3. The
which depends on the level a year ago and is augmented associated research on variances and prediction
by the total growth in all seasons during the past year. intervals is discussed in Section 4.4. The most
Thus, their parsimonious method requires only two important property of exponential smoothing is robust-
parameters. Snyder and Shami found that the two- ness, reviewed in Section 4.5. Discussion of the
parameter version of A-A was less accurate than the property of invertibility is deferred until Section 6.1
standard three-parameter version, although the differ- on parameter selection. The theoretical relationships
ences were not statistically significant. between judgmental forecasting and exponential
Some of the univariate methods in Table 1 have been smoothing are beyond our scope, although we note
generalized to the multivariate case by Enns, Machak, in passing that several exponential smoothing meth-
Spivey, and Wrobleski (1982), Harvey (1986), Jones ods are treated as models of judgmental extrapolation
(1966), and Pfefferman and Allon (1989). For the N-N (see for example Andreassen & Kraus, 1990).
4.1. Equivalent regression models DA-N method is equivalent to the ARIMA (1, 1, 2)
model, which can be written as
In large samples, several versions of exponential
smoothing are equivalent to exponentially weighted or ð1 BÞð1 /BÞXt ¼ ½1 ð1 þ / a /acÞB

DLS regression models (Brown, 1963; Gardner & /ða 1ÞB2 et ð5Þ
McKenzie, 1985, 1988). General exponential smooth-
ing (GES) (Brown, 1963) also relies on DLS We obtain an ARIMA (1, 1, 1) model by setting a = 1.
regression with either one or two discount factors to With a = c = 1, the model is ARIMA (1, 1, 0). When
fit a variety of functions of time to the data, including / = 1, we have a linear trend (A-N) and the model is
polynomials, exponentials, sinusoids, and their sums ARIMA (0, 2, 2):
and products. A detailed review of GES is available in
Gardner (1985), and since that time only a few papers ð1 BÞ2 Xt ¼ 1 ð2 a acÞB ða 1ÞB2 et
on the subject have appeared. ð6Þ
Gijbels, Pope, and Wand (1999) and Taylor
(2004c) showed that GES can be viewed in a kernel When / = 0, we have simple smoothing (N-N) and the
regression framework. Gijbels et al. found that equivalent ARIMA (0, 1, 1) model:
simple smoothing (N-N) is actually a zero-degree ð1 BÞXt ¼ ½1 ð1 aÞet ð7Þ
local polynomial kernel model, an idea that can be
extended to trends and seasonality although the The ARIMA (0, 1, 0) random walk model can be
details are unpleasant. Taylor (2004c) proposed obtained from (7) by choosing a = 1. ARIMA-equiv-
another type of kernel regression, an exponentially alent seasonal models for the linear exponential
weighted quantile regression (EWQR). The rationale smoothing methods exist, although most are so
for EWQR is that it is robust to distributional complex that it is unlikely they would ever be
assumptions. EWQR turns out to be equivalent to identified through Box–Jenkins procedures.
simple exponential smoothing of the cumulative
density function (the inverse of the quantile func- 4.3. Equivalent state-space models
tion). We can also think of EWQR as an extension of
GES to quantiles. Just as DLS delivers exponential The equivalent ARIMA models do not extend to the
smoothing for the mean, EWQR delivers the analogy nonlinear exponential smoothing methods. The only
for quantiles. A special case of EWQR was statistical rationale for exponential smoothing that
developed by Cipra (1992), who extended GES to includes nonlinear methods is due to Ord, Koehler,
the median by replacing the DLS criterion with and Snyder (1997). Prior to this work, state-space
discounted least absolute deviations. models for exponential smoothing were formulated
The only other GES research since 1985 is by using multiple sources of error (MSOE). For example,
Bartolomei and Sweet (1989), who compared GES to simple exponential smoothing (N-N) is optimal for a
the A-A and A-M methods using 47 time series from model with two sources of error (Muth, 1960). The
the M1 competition (Makridakis et al., 1982). The observation and state equations are written:
authors found little difference in forecast accuracy,
X t ¼ ‘ t þ mt ð8Þ
although they speculated that one of the damped-trend
methods might have done better.
‘ t ¼ ‘t1 þ gt ð9Þ
4.2. Equivalent ARIMA models The unobserved state variable ‘t denotes the local level
at time t, and the error terms m t and g t are generated by
All linear exponential smoothing methods have independent white noise processes. Using different
equivalent ARIMA models. The easiest way to see the methods, various authors (Chatfield, 1996; Harrison,
nonseasonal models is through the DA-N method, 1967; Nerlove & Wage, 1964; Theil & Wage, 1964)
which contains at least six ARIMA models as special showed that simple smoothing is optimal with a
cases (Gardner & McKenzie, 1988). If 0 b / b 1, the determined by the ratio of the variances of the noise
processes. Harvey (1984) also showed that the Kalman are the same in the additive- and multiplicative-error
filter for (8) and (9) reduces to simple smoothing in the cases, and this is true for all SSOE models.
steady state. Following similar logic, Hyndman et al. (2002)
For the trend and seasonal versions of exponential extended Ord et al.’s class of SSOE models to include
smoothing, the MSOE models are complex, as all the methods of exponential smoothing in Table 1
demonstrated in Proietti (1998, 2000), who gives except the DM methods. Because the state equations
examples of models that are equivalent to linear for all models are the same as the error-correction
versions of exponential smoothing. Another limitation forms of exponential smoothing (with modifications
of the MSOE approach is that researchers have been as discussed in Section 3.2), the observation equations
unable to find such models that correspond to are obvious. In the Hyndman et al. framework, there
multiplicative-seasonal versions of exponential are 12 basic models, each with additive or multipli-
smoothing. In response to these problems, Ord et al. cative errors, in effect giving 24 models in total.
(1997) built on the work of Snyder (1985) to create a Hyndman et al. (2002) remark that the additive- and
general, yet remarkably simple class of state-space multiplicative-error models give the same point
models with a single source of error (SSOE). For forecasts, but this is true only if the same parameters
example, the SSOE model with additive errors for the are found during model-fitting, an improbable occur-
N-N method is written as follows: rence. The additive-error models are usually fitted to
minimize the sum of squared errors, but the multipli-
Xt ¼ ‘ t1 þ et ð10Þ
cative-error models are fitted to minimize the sum of
‘t ¼ ‘t1 þ aet ð11Þ squared relative errors, where the errors are relative to
the one-step-ahead forecasts rather than the data.
Note that the observation Eq. (10) includes ‘t-1 rather The theoretical advantage of the SSOE approach to
than ‘t as in Eq. (8) of the MSOE model. The error exponential smoothing is that the forecast errors can
term e t in the observation equation is then the one- depend on the other components of the time series. As
step-ahead forecast error assuming knowledge of the an illustration, consider the N-N method/model. For
level at time t 1. The correspondence to simple the additive-error version, the variance of the one-
smoothing is seen in the state Eq. (11), which is the step-ahead forecast errors is Var(e t ) = r 2, while the
error-correction form of simple smoothing in Table 1, variance for the multiplicative-error model changes
except that the level ‘ is substituted for the smoothed with the level component, that is Var(‘t1e t ) = ‘ 2t1 r 2.
level S. In the more complex models, multiplicative-error
For the multiplicative-error N-N model, we alter effects can be profound because the variance changes
the additive-error SSOE model as follows: with every component of the time series (level, trend,
Xt ¼ ‘t1 þ ‘t1 et ð12Þ and seasonality).
To put the theoretical advantage of the SSOE
‘t ¼ ‘t1 ð1 þ aet Þ ¼ ‘t1 þ a‘t1 et ð13Þ approach another way, each of the linear exponential
smoothing models with additive errors has an ARIMA
In this case, the one-step-ahead forecast error is still equivalent. However, the linear models with multipli-
X t ‘t1, but it is no longer the same as e t . The state cative errors and the nonlinear models are beyond the
Eq. (13) becomes scope of the ARIMA class. As Koehler et al. (2001)
and Hyndman et al. (2002) observed, their state-space
‘t ¼ ‘t1
models are not unique and many other such models

Xt ‘t1 could be formulated. Some additional possibilities are
þ a‘t1 ¼ ‘t1 þ aðXt ‘t1 Þ discussed in Chatfield, Koehler, Ord, and Snyder
‘t1
(2001), the most readable reference on the state-space
ð14Þ
foundation for exponential smoothing.
Thus, we have shown that the multiplicative-error state The only theoretical criticism of the SSOE
equation can be written in the error-correction form of approach appears to be an OR Viewpoint by Johnston
simple smoothing. It follows that the state equations (2000) on a paper by Snyder, Koehler, and Ord (1999)
discussed in Section 7 on inventory control. Johnston demand m steps ahead, but this is also wrong as
argued that the SSOE model for simple smoothing is discussed in Section 7.1.
not really a model at all and should be viewed as an The simplest analytical approach to variance
estimation procedure. However, Snyder, Koehler, and estimation is based on the assumption that the series
Ord (2000) pointed out that both the SSOE and is generated by deterministic functions of time (plus
MSOE models for simple smoothing are special cases white noise) that are assumed to hold in a local
of a more general state-space model. For additional segment of the series. See Brown (1963), McKenzie
discussion of the theoretical relationships amongst (1986), and Sweet (1985) for results using this
these models, see Harvey and Koopman (2000) and approach. However, Newbold and Bos (1989) called
Ord, Snyder, Koehler, Hyndman, and Leeds (2005). the use of deterministic functions of time grossly
inaccurate in criticizing the work of Brown, Sweet,
4.4. Variances and prediction intervals McKenzie, Gardner (1983, 1985), and many other
authors. Newbold and Bos state that any amount of
Variances and prediction intervals for point fore- empirical evidence supports their criticism, although it
casts from exponential smoothing can be computed is curious that they give no references. There is no
using either empirical or analytical procedures. such empirical evidence in the references listed below,
Empirical procedures are available in Gardner or in the references to Gardner (1985).
(1988) and Taylor and Bunn (1999). Because post- For the A-A method, an analytical variance
sample forecast errors are usually much larger than expression was derived by Yar and Chatfield (1990),
fitted errors, I used the Chebyshev distribution to who assumed only that one-step-ahead errors are
compute probability limits from DA-N fitted errors at uncorrelated. But for this to be true, the equivalent
different forecast horizons. For data from the M1 ARIMA model must be optimal. Thus, Yar and
competition, coverage percentages were very close to Chatfield’s variance expression turns out to be the
targets. Nevertheless, Chatfield and Yar (1991) same as that of the equivalent ARIMA model. In a
complained that this procedure often results in follow-on study, Chatfield and Yar (1991) found an
constant variance as the lead time increases, while approximate formula for the A-M method, again by
Chatfield (1993) observed that the intervals are assuming that the one-step-ahead errors are uncorre-
sometimes too wide to be of practical use. These lated. In contrast to the additive case, they showed that
criticisms do not apply to the work of Taylor and the width of the multiplicative prediction intervals
Bunn, who proposed another way to avoid a normality depends on the time origin and can change with
assumption. They used quantile regression on the seasonal peaks and troughs.
fitted errors to obtain prediction intervals that are For the SSOE state-space models, there are numer-
functions of forecast lead time as suggested by ous recent papers containing variance results that can
theoretical variance expressions. For the N-N, A-N, be sorted out as follows. Empirical procedures for
and DA-N methods, Taylor and Bunn obtained variance estimation, including bootstrapping and sim-
excellent results in both simulated and M1 data. ulation from an assumed model, in both cases with
Analytical prediction intervals can be computed in either additive or multiplicative errors, are found in Ord
severalpffiffiffidifferent
ffi ways. The wrong way to do so is to et al. (1997), Snyder (2002), Snyder et al. (1999),
use s m as the standard deviation of m-step-ahead Snyder, Koehler, and Ord (2002), Snyder, Koehler,
forecast errors, where s is the standard deviation of the Hyndman, and Ord (2004), and Hyndman et al. (2002).
one-step-ahead errors. This expression has been used Analytical variance expressions for various models,
in the literature for various exponential smoothing with prediction intervals computed from the normal
methods but is correct only when the optimal model is distribution, are found in Ord et al. (1997), Koehler et
a random walk. For other models, the expression can al. (2001), Snyder, Ord, and Koehler (2001), Snyder et
be seriously misleading, as discussed in Chatfield and al. (1999, 2002, 2004), and Hyndman, Koehler, Ord,
Koehler (1991), Koehler (1990), pffiffiffiffi and Yar and Chat- and Snyder (2005). We can also classify the papers
field (1990). The expression s m has also been used according to whether they deal with the variance
for the standard deviation of cumulative lead time around cumulative or point forecasts. Variances for
cumulative forecasts are found in Snyder (2002) and that Chen’s conclusion applies to the other additive
Snyder et al. (2001, 2002, 2004), and are most used in seasonal methods.
inventory control, as discussed in Section 7.1, while Simple smoothing (N-N) is certainly the most
the other papers deal with point forecasts. Hyndman, robust forecasting method and has performed well in
Koehler, et al. (2005) is an extremely valuable many types of series not generated by the equivalent
reference because it contains all known results for ARIMA (0, 1, 1) process. Such series include the
variances and prediction intervals around point fore- very common first-order autoregressive processes
casts. The models are divided into three classes. The and a number of lower-order ARIMA processes
first class includes linear models with additive errors (Cogger, 1973; Cohen, 1963; Cox, 1961; Pandit &
and ARIMA equivalents, corresponding to the N-N, Wu, 1974; Tiao & Xu, 1993). Bossons (1966)
A-N, DA-N, N-A, A-A, and DA-A methods. The showed that simple smoothing is generally insensi-
second class includes the same models, but now the tive to specification error, especially when the mis-
errors are assumed to be multiplicative to enable the specification arises from an incorrect belief in the
variance to change with the level and trend of the time stationarity of the generating process. Related work
series. In the third class, including the N-M, A-M, and by Hyndman (2001) shows that ARIMA model
DA-M methods, the variance changes with level, selection errors can inflate MSEs compared to simple
trend, and the multiplicative seasonal pattern. Equa- smoothing. Hyndman simulated time series from an
tions for some of the exact prediction intervals are ARIMA (0, 1, 1) process and fitted a restricted set of
tedious, so handy approximations are given. Note that ARIMA models of order (0, 1, 1), (1, 1, 0), and (1,
a few state-space models are not included in the 1, 1), each with and without a constant term. The
Hyndman, Koehler, et al. (2005) classification and best model was selected using Akaike’s Information
may prove to be intractable. Criterion (AIC) (Akaike, 1970). The ARIMA fore-
Thus, for most state-space models, we have four cast MSEs were significantly larger than those of
options for prediction intervals. They can be empirical simple smoothing due to incorrect model selections,
or analytical, and each type can have additive or a problem that became worse when the errors were
multiplicative errors. There is no guidance on how non-normal.
one should choose from these options. Hyndman, Simple smoothing has done especially well in
Koehler, et al. (2005) do not test their analytical forecasting aggregated economic series with rela-
prediction intervals with real data, so there is no way tively low sampling frequencies. Rosanna and Seater
to compare performance to the empirical results in (1995) show that such series can often be approx-
earlier papers. Because of the normality assumption, imated by an ARIMA (0, 1, 1) process. This finding
the analytical prediction intervals will almost certainly has been misinterpreted by some researchers. The
prove to be too narrow. This was also the case with series examined by Rosanna and Seater were not
their empirical prediction intervals in the M1 and M3 generated by an ARIMA (0, 1, 1) process. The
data (Hyndman et al., 2002). series were sums of averages over time of data
generated more frequently than the reporting inter-
4.5. Robustness val. The effects of averaging and temporal aggrega-
tion were to destroy information about the
The equivalent models help explain the general generating process, producing series for which the
robustness of exponential smoothing, although there ARIMA (0, 1, 1) process was merely an artifact.
are other possible explanations for the performance of Much the same problem can occur in company-level
several methods. For the DA-N method, the process data. For example, simple exponential smoothing
of computing minimum-MSE parameters is an indi- was a very competitive method in Schnaars’ (1986)
rect way to identify a more specific model from the study of annual unit sales series for a variety of
special cases it contains. For the A-A method, a products.
simulation study by Chen (1997) showed that forecast Satchell and Timmermann (1995) give a different
accuracy was not sensitive to the assumed data explanation for the performance of simple smoothing
generating process. It seems reasonable to assume in economic time series. In Muth (1960), simple
smoothing was shown to be equivalent to a random equivalent ARIMA process. If a nonseasonal differ-
walk with noise model, assuming that the process ence of order 2 minimizes variance, the equivalent
began an infinite number of periods ago. Satchell and ARIMA process suggests the A-N method. Finally, a
Timmerman re-examined this model and derived an seasonal method is used when a seasonal difference
explicit formula for weights when the time series has a reduces variance.
finite history. They found that exponentially declining Using M1-competition data, the Gardner–McKen-
weights are surprisingly robust as long as the ratio of zie procedure was slightly better than the DA-N
the variance of the random walk process to the method applied to all nonseasonal series, with the
variance of the noise component is not exceptionally DA-M method applied to all seasonal series. The
small. Gardner–McKenzie procedure was also tested by
Tashman and Kruk (1996), who made comparisons
to two alternatives, a condensed version of rule-based
5. Method selection forecasting (see Section 5.2) and selection using the
Bayesian Information Criterion (BIC) (Schwarz,
The definitions of aggregate and individual method 1978). Using data from the M2 competition (Makri-
selection in the work of Fildes (1992) are useful in dakis et al., 1993) as well as Schnaars’ (1986)
exponential smoothing. Aggregate selection is the collection of annual time series, Tashman and Kruk
choice of a single method for all time series in a found little agreement among the selection procedures
population, while individual selection is the choice of about the best method for many time series. Gardner–
a method for each series. In commentary on the M-3 McKenzie and rule-based forecasting gave similar
competition, Fildes (2001) summed up the state of the accuracy that was better than the BIC, but all three
art in time series method selection: In aggregate procedures had trouble differentiating between appro-
selection, it is difficult to beat the damped-trend priate and inappropriate applications of both the
version of exponential smoothing. In individual damped trend and simple smoothing. Taylor (2003a)
selection, it may be possible to beat the damped also obtained somewhat disconcerting results with the
trend, but it is not clear how one should proceed. The Gardner–McKenzie procedure. In tests using monthly
evidence reviewed below supports this judgment and series from the M3 competition, some series that were
the research on individual selection of exponential clearly trending were classified as stationary due to
smoothing methods is best described as inconclusive. high levels of variance.
Individual method selection can be done in a variety Shah (1997) proposed method selection based on
of ways, as discussed in Sections 5.1–5.4. In Section discriminant analysis of descriptive statistics for
5.5, we briefly consider the problems in identification individual series. His procedure identified methods
as opposed to selection. The question of whether out- significantly more accurate than use of the same
of-sample criteria should be used for method selection method for all time series, a conclusion that is difficult
is beyond our scope—see Tashman (2000) for a to generalize because of the limited range of methods
review. and data considered. Shah used only three candidate
methods (N-N, A-M, and Harvey’s basic structural
5.1. Time series characteristics model) and applied them only to the quarterly time
series in the M1 collection. It would be helpful to have
Method-selection procedures using time series discriminant analysis results when selection is made
characteristics have been proposed by Gardner and from a larger group of candidate methods such as that
McKenzie (1988), Meade (2000), and Shah (1997). In in Table 1.
the Gardner–McKenzie procedure, method selection The most exhaustive study of method selection, a
is done using the variances of differences of the data. paradigm of research design in comparative methods,
The N-N method is selected when differencing serves is found in Meade (2000), whose candidates included
only to increase variance. If a nonseasonal difference two naı̈ve methods, a deterministic trend, the robust
of order 1 minimizes variance, the DA-N method is trend of Fildes (1992), methods selected automatically
selected because that is the order of differencing in the from the ARIMA and ARARMA classes, and three
exponential smoothing methods (N-N, A-N, and DA- series from the M1 competition and concluded that
N) applied to seasonally adjusted data when appro- they were more accurate than various alternatives.
priate. Meade simulated time series from a wide range However, they did not compare their results to
of ARIMA and ARARMA processes, fitted all aggregate selection of the DA-N method. Gardner
alternative methods, and computed descriptive statis- (1999) made this comparison and found that
tics for data used in model-fitting. These statistics aggregate selection of the DA-N method was more
were used as explanatory variables in a regression- accurate at all forecast horizons than either version
based performance index for each method. Meade of rule-based forecasting.
tested his procedure with additional simulated series Another version of rule-based forecasting by Adya
as well as the 1001 series from the M1 competition et al. (2001) reduced C&A’s rule base from 99 to 64
and Fildes’ collection of 261 telecommunications rules for data with no domain knowledge. They also
series. In the simulated series, Meade’s procedure deleted Brown’s double exponential smoothing from
consistently selected the best method from all the list of candidate methods. Adya et al. tested their
candidates. This was expected because the series system in the M3 competition and obtained better
were generated from one of the candidate methods. In results. Rule-based forecasting was slightly more
the M1 series, the results were less encouraging, with accurate than aggregate selection of DA-N in annual
selected methods ranking fifth in median performance data and performed about the same as DA-N in
and second in mean performance. In the Fildes series, seasonally adjusted monthly and quarterly data.
the selected methods ranked fourth for both median Arinze (1994) developed a rule-induction type of
and mean performance, although it is by now well expert system to select from the N-N, A-N, and A-M
established that Fildes’ robust trend is the only methods, adaptive filtering, moving averages, and
reasonable method for these series. Meade’s proce- time series decomposition. Arinze tested his system
dure, like that of Shah, may have merit in selection using 85 aggregate economic series and found that it
from the exponential smoothing class, but it is picked the best method about half the time. Another
difficult to tell. rule-induction system was developed by Flores and
Pearce (2000) and tested with M3 competition data.
5.2. Expert systems Flores and Pearce were pessimistic about their results,
which at best were mixed.
Expert systems for individual selection have been
proposed by Collopy and Armstrong (C&A) (1992), 5.3. Information criteria
Vokurka, Flores, and Pearce (1996), Adya, Collopy,
Armstrong, and Kennedy (2001), Arinze (1994), and Numerous information criteria are available for
Flores and Pearce (2000). C&A’s rule-based fore- selection of an exponential smoothing method.
casting system includes 99 rules constructed from Information criteria have an advantage over the
time series characteristics and domain knowledge. procedures discussed in Sections 5.1–5.2 in that they
These rules combine the forecasts from four meth- can distinguish between additive and multiplicative
ods: a random walk, time series regression, Brown’s seasonality. The disadvantage of information criteria
double exponential smoothing, and the A-N method. is that the computational burden can be significant.
This is an odd set of candidate methods because For example, Hyndman et al. (2002) recommend
Brown’s method is a special case of the A-N fitting all models (from their set of 24 alternatives)
method. Because the C&A approach requires con- that might conceivably be appropriate for a time
siderable human intervention in identifying features series, then selecting the one that minimizes the AIC.
of time series, Vokurka et al. (1996) developed a In the M1 and M3 data, the Hyndman et al. procedure
completely automatic expert system that selects from gave accuracy results that compared favorably to
a different set of candidate methods: the N-N and commercial software and rule-based forecasting,
DA-N methods, classical decomposition, and a although, like most of the selection procedures
combination of all candidates. C&A and Vokurka discussed above, they did not compare their results
et al. tested their systems using 126 annual time to aggregate selection of the DA-N method. A
comparison of the M1 results in Hyndman et al. delivery performance (Zhao, Xie, & Leung, 2002).
(2002) with those in Gardner and Mckenzie (1985) Forecast errors also contribute to the bullwhip effect,
shows that the DA-N method was significantly more the tendency of orders to increase in variability as one
accurate than the state-space models, both overall and moves up a supply chain (Chandra & Grabis, 2005;
at most individual forecast horizons. This conclusion Dejonckheere, Disney, Lambrecht, & Towill, 2003,
holds for both the 1,001 series and the subset of 111 2004; Zhang, 2004). It follows that forecasting
series. For the M3 series, we can compare DA-N methods in operating systems should be selected on
results from Makridakis and Hibon (2000) to the basis of benefits, although this has been done in
Hyndman et al. (2002). In the annual M3 series, only a few studies.
overall and at every horizon, DA-N was more The only study of method selection for a manu-
accurate than the state-space models; in the quarterly facturing process is by Adshead and Price (1987),
series, the state-space models have a small advantage who developed a cost function to select a method for a
at horizons 1 and 2, but overall DA-N was more producer of industrial fasteners with annual sales of
accurate. For the monthly M3 series, we have o4 million. Total costs affected by forecasting
additional results for Taylor’s (2003a) DM-N method. included inventory carrying costs, stock-out costs,
The state-space models have a small advantage in the and overtime. Using real data, the authors developed a
short term over DA-N and DM-N, but overall there is detailed simulation model of the plant, including six
little to choose among the three alternatives. manufacturing operations carried out on 33 machines.
Later work by Billah, Hyndman, and Koehler They computed costs for a range of parameters in the
(2005) compared eight information criteria used to N-N method, the double smoothing version of the A-
select from four exponential smoothing methods. The N method, and Brown’s (1963) quadratic exponential
criteria included the AIC, BIC, and other standards, as smoothing, a method that performed very poorly in
well as two new Empirical Information Criteria (EIC) empirical studies and thus disappeared from the
that penalize the likelihood of the data by a function literature. Stock-out costs proved difficult to measure,
of the number of parameters in the model. One of the and the authors were forced to test several assump-
EIC penalty functions is linear, while the other is tions in the cost function. Regardless of the assump-
nonlinear, and neither depends on the length of the tion, the N-N method was the clear winner.
time series (they are intended for use in groups of In a US Navy distribution system with more than
series with similar lengths). Billah et al.’s candidate 50,000 inventory items, Gardner (1990) compared the
exponential smoothing methods included N-N, N-N effects of a random walk and the N-N, A-N, and DA-
with drift (see Section 3.4), A-N, and the state-space N methods on the average delay time to fill back-
version of DA-N. Billah et al. tested the criteria with orders. Delay time was estimated in a simulation
simulated time series and seasonally adjusted M3 model using 9 years of real daily demand and lead
data. Although the EIC criteria performed better than time history, and the DA-N method proved superior
the others, this study is not benchmarked, and we do for any level of inventory investment.
not know whether the EIC criteria picked methods For a distributor of electronics components, Flores,
better than aggregate selection of the DA-N method. Olson, and Pearce (1993) compared methods on the
basis of costs due to forecast errors, defined as the
5.4. Projected operational or economic benefits sum of excess inventory costs (above targets) and the
margin on lost sales. The authors used a sample of 967
In production and inventory control, forecasting is demand series to compute costs for the N-N method
a major determinant of inventory costs, service levels, with fixed and adaptive parameters, the double
scheduling and staffing efficiency, and many other smoothing version of the A-N method, and the
measures of operational performance (Adshead & median value of historical demand. For items with
Price, 1987; Fildes & Beard, 1992; Lee, Feller, & margins greater than 10%, the N-N method with a
Adam, 1992). In the broader context of supply chains, fixed parameter was best, while the median was best
forecasting determines the value of information for items with lower margins. The relative perfor-
sharing, a function that reduces costs and improves mance of the median was surprising, and Flores et al.
remarked that a broader study might change the Winters class, Chatfield and Yar give a common-sense
conclusions. Essentially the same cost function as that strategy for identifying the most appropriate method.
of Flores et al. was used in a study by Mahmoud and This strategy is expanded in Chatfield (1988, 1995,
Pegels (1990), although this paper is impossible to 1997, 2002, 2004), and here we give the strategy in a
evaluate because several smoothing methods were not nutshell. First, we plot the series and look for trend,
defined. seasonal variation, outliers, and changes in structure
The only other study of method selection using that may be slow or sudden and may indicate that
operational or economic benefits is by Eaves and exponential smoothing is not appropriate in the first
Kingsman (2004), discussed in Section 7.2. place. We should examine any outliers, consider
making adjustments, and then decide on the form of
5.5. Identification vs. selection the trend and seasonal variation. At this point, we
should also consider the possibility of transforming
Although state-space models for exponential the data, either to stabilize the variance or to make the
smoothing dominate the recent literature, very little seasonal effect additive. Next, we fit an appropriate
has been done on the identification of such models method, produce forecasts, and check the adequacy of
from the data as opposed to selection of the best- the method by examining the one-step-ahead forecast
fitting model. By identification, we mean data errors, particularly their autocorrelation function. The
analysis to detect the appropriate form of seasonality findings may lead to a different method or a
and trend. The only possibly relevant papers here are modification of the selected method. For a sample
by Koehler and Murphree (1988) and Andrews of reasonable size, it would be useful to have results
(1994). Koehler and Murphree identified and fitted for this strategy as a validation of the automatic
MSOE state-space models to 60 time series (all those selection procedures discussed above. It does not
with a minimum length of 40 observations) from the appear that any of the automatic procedures have been
111 series in the M1 competition. Their identification validated in such a manner.
and fitting routine is best described as semi-
automatic, with some human intervention required.
Koehler and Murphree did not attempt to match their 6. Model-fitting
model selections to equivalent exponential smooth-
ing methods. They compared forecast accuracy In order to implement an exponential smoothing
(mean and median APEs) to simple exponential method, the user must choose parameters, either
smoothing and ARIMA models identified by an fixed or adaptive, as well as initial values and loss
expert. In general, the identification process was functions. The user must also decide whether to
disappointing; although there were some differences normalize the seasonals, a problem considered earlier
in subsets of the data, simple exponential smoothing in Section 3.3. The research in choosing fixed
ranked first in overall accuracy by a significant parameters, discussed in Section 6.1, is not particu-
margin. For the complete set of 111 series, Andrews larly helpful, and there are several open research
identified and fitted MSOE models (all with expo- questions. To avoid model-fitting for the N-N
nential smoothing equivalents), again using a semi- method, we can use adaptive parameters, reviewed
automatic procedure. His results appear to be better in Section 6.2. Parameter selection is not indepen-
than the Box–Jenkins results, although he did not dent of initial values and loss functions, as discussed
give enough details to be sure, and he did not make in Section 6.3.
comparisons to the exponential smoothing results
reported for the M1 competition. 6.1. Fixed parameters
Rather than attempt to identify a model, we could
attempt to identify the best exponential smoothing There is no longer any excuse for using arbitrary
method directly. Chatfield and Yar (1988) call this a parameters in exponential smoothing given the
bthoughtfulQ use of exponential smoothing methods availability of good search algorithms, such as the
that are usually regarded as automatic. For the Holt– Excel Solver. For examples of using the Solver in
parameter searches, see Bowerman, O’Connell, and For all seasonal exponential smoothing methods,
Koehler (2005) and Rasmussen (2004). One caution- we can test parameters for invertibility using an
ary note is that the response surface is not necessarily algorithm by Gardner and McKenzie (1989), assum-
convex for any exponential smoothing method, as ing that additive and multiplicative invertible regions
discussed in Farnum (1992). Thus, it may be are identical. However, this test may fail to eliminate
advisable to start any search routine from several some troublesome parameters. An astonishing finding
different points to evaluate local minima. in Archibald’s study is that some combinations of [0,
We hope that our search routine comes to rest at a 1] parameters near boundaries fall within the ARIMA
set of invertible parameters, but this may not happen, invertible region, but the weights on past data diverge.
as discussed below. Invertible parameters create a The result is that some older data are weighted more
model that allows each forecast to be written as a heavily than recent data. Archibald found that
linear combination of all past observations, with the diverging weights occur in both standard and state-
absolute value of the weight on each observation less space versions of the A-M method. Through trial and
than one, and with recent observations weighted error, Archibald found a more restrictive parameter
more heavily than older ones. This definition is region for state-space A-M that seemed to prevent
generally accepted, but the words stability and diverging weights. The lesson from Archibald’s study
invertibility are often used interchangeably in the is that one should be skeptical of parameters near
literature, which can be confusing. One definition of boundaries in all seasonal models.
stability comes from control theory. If we view an Archibald’s work was extended by Hyndman,
exponential smoothing method as a system of linear Akram, and Archibald (2005), who give equations
difference equations, a stable system has an impulse that define an badmissibleQ parameter space for all
response that decays to zero over time. The stability additive seasonal methods except the DM methods.
region for parameters in control theory is the same as Combinations of parameters that fall within the
the invertibility region in time series analysis admissible space produce truly invertible models.
(McClain & Thomas, 1973). But from the time Although the admissible space is complex for all
series perspective, stability has another definition methods considered, it is a simple matter to program
related to stationarity and is not relevant here. For a the equations as a final check on fitted parameters.
detailed comparison of the properties of stability, Hyndman; Akram et al. also make a case similar to that
stationarity, and invertibility, see Pandit and Wu of Archibald and Koehler (2003) (see Section 3.3) for
(1983). Examples of authors that use stability in the renormalization of seasonals in state-space models.
control theory sense are Chatfield and Yar (1991), For the N-N method, Johnston and Boylan (1994)
Gardner and McKenzie (1985, 1988, 1989), Lawton stand alone in recommending that a be constrained to
(1998), McClain (1974), McClain and Thomas values of 0.50 or less. Their analysis is complex and
(1973), and Sweet (1985). they do not reconcile this constraint with the many
In the linear non-seasonal methods, the parameters examples of time series in which a N 0.50 is optimal.
are always invertible if they are chosen from the usual Once the parameters have been selected, another
[0, 1] interval. The same conclusion holds for problem is deciding how frequently they should be
quarterly seasonal methods, but not for monthly updated. When forecasting from multiple time origins,
seasonal methods (Sweet, 1985), whose invertibility Fildes, Hibon, Makridakis, and Meade (1998) com-
regions are complex. For the monthly A-A and A-M pared three options for choosing parameters in the N-
methods, Archibald (1990) and Sweet (1985) give N, A-N, and DA-N methods: (1) arbitrarily, (2)
examples of some apparently reasonable combina- optimize once at the first time origin, and (3) optimize
tions of [0, 1] parameters that are not invertible. Both each time forecasts are made. These options were
authors test A-M parameters using the A-A inverti- tested in the Fildes collection of 261 telecommunica-
bility region. Non-invertibility usually occurs when tions series, and the best option was to optimize each
one or more parameters fall near boundaries, or when time forecasts were made. It remains to be seen
trend and/or seasonal parameters are greater than the whether this conclusion applies to series that are not
level parameter. so well behaved.
6.2. Adaptive smoothing the one-step-ahead error. The practical consequence is

that the smoothing parameter very frequently exceeds
The term adaptive smoothing is used to mean 1.0; when this happens, the authors reset the parameter
many different things in the literature. Here, we mean to 1.0, thus producing a random-walk forecast.
only that the parameters are allowed to change The only adaptive method that has demonstrated
automatically in a controlled manner as the character- significant improvement in forecast accuracy com-
istics of the time series change. In Gardner (1985), I pared to the fixed-parameter N-N method is Taylor’s
concluded that there was no credible evidence in favor (2004a, 2004b) smooth transition exponential smooth-
of any of the numerous forms of adaptive smoothing. ing (STES). Smooth transition models are differenti-
See also Armstrong (1984) for a similar conclusion. ated by at least one parameter that is a continuous
Since then, a number of new ideas for adaptive function of a transition variable, V t . The formula for
smoothing have appeared. the adaptive parameter a t is actually a logistic
The Kalman filter can be used to compute function:
smoothing parameters. Snyder (1988) developed such
an algorithm for the N-N method, assuming a random at ¼ 1ð1 þ expða þ bVt ÞÞ ð15Þ
walk with a single source of error. The method is
similar to Gilchrist’s (1976) exact DLS version of the There are several possibilities for V t , including e t , |e t |,
N-N method in that no initial values or model-fitting and e2t . Whatever the transition variable, the logistic
are necessary. Snyder’s method contains a short-run function restricts a t to [0, 1]. The drawback to STES
smoothing parameter that eventually converges to a is that model-fitting is required to estimate a and b;
long-run parameter. Using the 111 series from the M1 thereafter, the method adapts to the data through V t . In
competition, Snyder’s MAPE results were about the Taylor (2004b), V t = e2t was the best choice for
same as the standard single-parameter N-N method in simulated time series with level shifts and outliers as
monthly data, but slightly better in annual and well as the 1428 M3 monthly series. As benchmarks
quarterly data. In Snyder (1993), his filter was for STES, Taylor computed results for numerous other
implemented in a system for forecasting auto parts exponential smoothing methods, with both fixed and
sales, although problems within the company made it adaptive parameters. STES performed well in the
difficult to assess forecasting performance. simulated series, as expected. In the many re-
A more elaborate Kalman filtering idea, by examinations of the M3 series, Taylor is the only
Kirkendall (1992), uses adaptive parameters in four researcher who followed the advice of Fildes (1992)
MSOE state-space models designated as steady, and Fildes et al. (1998) and evaluated forecast
outlier, level shift, and a mixed model with mean performance across time. Using the last 18 observa-
and variance based on a weighted average of the first tions of each series, Taylor computed successive one-
three models. The steady model is the N-N method step-ahead monthly forecasts, for a total of 25,704
and the others are variations. Separate model esti- forecasts. Judged by MAPE and median APE, STES
mates and separate posterior probabilities are main- was the most accurate method tested, significantly so
tained for each of the models, and the state transitions for the MAPE. Additional empirical evidence is given
from model to model according to the probabilities. in Taylor (2004a), a study in which STES was
Kirkendall gives limited empirical results that are not arguably the best method overall in volatility fore-
benchmarked. Similar proposals for adapting to casting of stock index data compared to the fixed-
changes in structural models corresponding to expo- parameter version of N-N and a range of GARCH and
nential smoothing are available in Jun (1989) and Jun autoregressive models.
and Oliver (1985), but again the empirical results are Only a few authors have proposed adapting the
limited and not benchmarked. parameters in the trend methods. In the A-A method,
An unpromising scheme for adapting the N-N Williams (1987) contends that only the level param-
method was suggested by Pantazopoulos and Pappis eter should be adapted. Mentzer (1988) and Mentzer
(1996), who set the parameter equal to the absolute and Gomes (1994) agree with Williams and recom-
value of the two-step-ahead forecast error divided by mend setting the level parameter in the A-A method
equal to the absolute percentage error in the current Loss functions included the MAD, MAPE, median
period (if the error exceeds 100%, the level parameter APE, MSE, the sum of the cubed errors, and a variety of
is set equal to 1.0). Mentzer and Gomes present results non-symmetric functions. There was little difference in
for the M1 data that are the best of all methods average post-sample accuracy regardless of initial
reported to date. But in the M3 data, Taylor (2004b) values or loss function. Furthermore, sample size or
found that the Mentzer and Gomes version of the A-A type of data (annual, quarterly, or monthly) did not
method was certainly the worst exponential smooth- make any consistent difference in the best choice of
ing method tested, regardless of the error measure or initial values or loss function. The authors repeated the
whether the parameters were fixed or adaptive. There study in the Fildes telecommunications data with much
seems to be no explanation for this contradiction in the same findings.
performance. The major conclusion from the Makridakis and
Hibon study is that the common practice of initializing
6.3. Initial values and loss functions by least squares, choosing parameters from the [0, 1]
interval, and fitting methods to minimize the MSE
Standard exponential smoothing methods are provides satisfactory results. The authors caution that
usually fitted in two steps, by choosing fixed initial this conclusion applies to automatic forecasting of large
values (see Gardner, 1985, for a review of the numbers of time series and may not hold for individual
alternatives), followed by an independent search for series, especially those containing significant outliers.
parameters. In contrast, the new state-space methods To cope with outliers, I argue for a MAD loss function
are usually fitted using maximum likelihood, a (Gardner, 1999). However, I point out that there are
procedure that makes the choice of initial values less exceptions, making it advisable to evaluate both MSE
of a concern because they are refined simultaneously and MAD loss functions in many series.
with the smoothing parameters during the optimiza-
tion process. Unfortunately, maximum likelihood may
require significant computation times, as discussed in 7. Forecasting for inventory control
Hyndman et al. (2002). For example, in monthly
seasonal models with a damped trend, there are 13 In inventory control with non-intermittent de-
initial values and 4 parameters, so the optimization is mand, exponential smoothing methods are the same
done in 17-dimensional space. as in other applications, but variance estimates are
Another maximum likelihood procedure differing considerably different. Variances of cumulative de-
in many details from Hyndman et al. is found in mand over the complete reorder lead time are
Broze and Mélard (1990), who give meticulous required, as discussed in Section 7.1. If demand is
instructions for fitting all of the linear exponential intermittent, we need both specialized smoothing
smoothing methods in Table 1. The Broze and Melard methods and variance estimates, as discussed in
procedure is difficult to evaluate because they give no Section 7.2. Our discussion is concerned only with
empirical results or computation times. An alternative these topics, and the vast literature on inventory
to maximum likelihood is Segura and Vercher’s decision rules constructed from forecasting systems is
(2001) nonlinear programming model that optimizes beyond our scope.
initial values and parameters simultaneously, but
again the authors are silent about empirical results 7.1. Non-intermittent demand
and computation times.
In an exhaustive re-examination of the M1 series, For the N-N method, what might be called the
Makridakis and Hibon (1991) measured the effect of traditional estimate of thepffiffiffiffi standard deviation of total
different initial values and loss functions in fitting the lead time demand is s m, where s is the one-step-
N-N, A-N, and DA-N methods, using seasonally ahead standard deviation and m is the lead time
adjusted data where appropriate. Initial values were (Brown, 1959, 1967). This estimate has been persistent
computed by least squares, backcasting, and several in the literature, but it is biased. The correct multiplier
simple methods such as setting all initial values to zero. for the standard deviation was derived using an MSOE
state-space model by Johnston and Harrison (1986) and should have produced results substantially in favor of
an SSOE model by Snyder et al. (1999): the multiplicative-error version, but it did not. Differ-
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ences in simulated fill rates and order-up-to levels
f ða; mÞ ¼ ðm þ aðm 1Þmð1 þ að2m 1Þ=6ÞÞ between the two N-N versions were very small except
ð16Þ when a major step change in the series occurred.
When data with trends and seasonality were simulat-
The effect of this multiplier is significant for any value ed, the multiplicative-error N-N method did better, but
of a at any lead time greater than one period. For this finding is misleading because the method was not
example, with a = 0.2 and a lead time of two periods, appropriate for the data.
the correct standard deviation is more than twice the Stockouts are not treated in the research discussed
size of the traditional estimate. above, although the parametric bootstrap could easily
For the linear methods A-N, DA-N, A-A, and DA- be adapted to do so. Stockouts truncate the distribution
A, Snyder et al. (2004) used SSOE models to develop of demand, causing systematic bias in estimates of the
variance expressions for cumulative lead-time demand, mean and variance. To correct for such bias in the N-N
assuming both additive and multiplicative errors. method, Bell (1978) replaced X t with the conditional
Snyder et al. (2004) can be viewed as a companion mean of the demand distribution for periods that
paper to Hyndman, Koehler, et al. (2005), which include stockouts. The conditional mean is defined as
contains prediction intervals around point forecasts the expected value of demand, given that observed
for the same methods and several others. For those who demand is greater than or equal to the quantity actually
prefer MSOE models, some limited and far more available for sale. Demand is assumed normal, with
complex variance results are available in Harvey and variance estimated by the smoothed MAD. The
Snyder (1990). It is important to understand how the normality assumption may seem doubtful, but Bell
error assumptions in the SSOE models affect the (1978, 2000) and Artto and Plykkänen (1999) argue
distribution of cumulative lead-time demand. If the that product stocking methods based on the normal
errors are additive and normal, cumulative lead-time distribution work well in practice. Through simulation,
demand will of course be normal. If the errors are Bell (1981, 2000) found that his procedure works well
normal and multiplicative, cumulative lead-time so long as the number of stockouts does not exceed
demand will not be normal, although Hyndman, 50%. For larger numbers of stockouts, Bell (2000)
Koehler, et al. (2005) suggest that the normal gives adjustments to his procedure.
distribution is a safe approximation. For the smooth-
ing methods without analytical variance expressions, 7.2. Intermittent demand
there are many bootstrapping procedures in the
literature that can be used to develop empirical If time series of inventory demands are observed
variance estimates. The parametric bootstrapping intermittently, we cannot recommend the N-N method
procedure of Snyder (2002) and Snyder et al. because the forecasts are biased low just before a
(2002) should appeal to the practical forecaster demand occurs and biased high just afterward,
because it is tailored to lead-time demand and can resulting in excessive stock levels. The standard
be used when the distribution of demand is non- method of forecasting intermittent series was devel-
normal, when the lead time is stochastic, and when oped by Croston (1972) and works as follows. Using
demand is intermittent. the N-N method, we smooth two components of the
Snyder et al. (2002) used the parametric bootstrap time series separately, the observed value of nonzero
to study an important practical question about state- demand (D t ) and the inter-arrival time of transactions
space modeling: Do the assumptions of additive and ( Q t ). The smoothed estimates are denoted Z t and P t ,
multiplicative errors make any difference in estimat- respectively, and their recurrence equations are
ing variances? The authors used multiplicative errors
Zt ¼ aDt þ ð1 aÞZt1 ð17Þ
in generating data with no trend or seasonal pattern,
and then fitted both additive- and multiplicative-error
versions of the N-N method. This research design Pt ¼ aQt þ ð1 aÞPt1 ð18Þ
The value of a is the same in both equations. The tion means that the probability of demand occurrence
expected value of demand per unit time ( Y t ) is then is constant (one of Croston’s stated assumptions), or
equivalently that the mean inter-arrival time of the
EðYt Þ ¼ Zt =Pt ð19Þ
demand series is constant. Therefore, Eq. (18) is not
If there is no demand in a period, Z t and P t are used, and P in Eq. (19) has no time subscript. This
unchanged. When demand occurs every period, the model can generate negative values, so an alternative
Croston method gives the same forecasts as the model using the logarithms of nonzero demands was
conventional N-N method. specified. Variance estimates for the models were
Syntetos and Boylan (2001) showed that E( Y t ) is developed using a parametric bootstrap from the
biased high and derived a corrected version of Eq. normal distribution. Snyder gives encouraging results
(19), although this version is not mentioned in later for his models for a few time series, but more
research by Syntetos and Boylan (2005) and Syntetos, evidence is needed to support a constant mean inter-
Boylan, and Croston (2005). Instead, the later papers arrival time. This idea is contrary to the philosophy of
give a different corrected version of Eq. (19): exponential smoothing, a problem acknowledged by
Snyder.
EðYt Þ ¼ ð1 a=2ÞðZt =Pt Þ: ð20Þ
Further analysis of Snyder’s models is given in
The modified Croston forecasting system defined Shenstone and Hyndman (2005), who developed
by Eqs. (17), (18), and (20) was used in Eaves and analytical prediction intervals for them. Shenstone
Kingman (2004), who tested the system using a and Hyndman also found that there is no underlying
sample of 11,203 repair parts from Royal Air Force stochastic model for Croston’s method or the two
inventories. The results varied somewhat depending variants proposed by Syntetos and Boylan (2001,
on the degree of aggregation of the data (weekly, 2005). Any models that might be considered as
monthly, quarterly) and the type of demand pattern candidates simply do not match the properties of
(ranging from smooth to highly intermittent). How- intermittent data. Thus, if we wish to have analytical
ever, in general, the modified Croston method was prediction intervals for intermittent data, the only
more accurate than the original, and both methods option is to adopt one of Snyder’s models.
performed significantly better than the N-N method. Moreover, Shenstone and Hyndman’s work creates
To compute safety stocks, Eaves and Kingman relied doubts about the assumptions behind the variance
on a variance expression developed by Sani and expressions for Croston’s method found in the
Kingsman (1997) (discussed below). The authors literature, including Croston (1972) as corrected by
extrapolated the sample savings to the entire inven- Rao (1973), Johnston and Boylan (1996a), Sani and
tory, with convincing results. The conventional N-N Kingsman (1997), and Schultz (1987). All of these
method produced an additional 13.6% in inventory variance expressions must be regarded as approxima-
investment (o285 million) over the modified Croston tions, although they have generally worked well in
method. empirical studies. The best approximation for the
Another idea to correct for bias in the Croston variance of mean demand may be that of Sani and
method is given in Levén and Segerstedt (2004). Kingsman:
Rather than smooth size and inter-arrival time
VarðYt Þ ¼ max½VarðZt Þ=Pt ; 1:1Zt =Pt ð21Þ
separately as in Eqs. (17) and (18), the authors
proposed a method that can be shown to be equivalent The second term on the right-hand side looks peculiar,
to smoothing both components in the same equation. but the purpose is to make certain that the variance is
The authors give no explanation of how this idea larger than the mean, a relationship required by the
corrects for bias. assumption that demands are generated by the
Snyder (2002) took a state-space approach to the negative binomial distribution. The variance of Z t is
study of Croston’s method. The underlying model estimated by the smoothed MAD, assuming normality
assumes that nonzero demands are generated by an and using the same a as in Eqs. (17) and (18). In an
ARIMA (0, 1, 1) process, while the inter-arrival times empirical study of forecasting the demand for repair
follow the Geometric distribution. The latter assump- parts, Sani and Kingsman showed that Eq. (21) gave
much better service level performance than Croston’s method, the subject of a large body of theoretical
original variance expression. research. It may also be surprising that there have
When should we use Croston’s method or one of been few applications of the damped-trend methods.
its variants? Johnston and Boylan (1996a, 1996b) In most cases, little attention was given to method
found that Croston’s method is superior to the N-N selection, a generalization substantiated by the large
method when the average inter-arrival time is greater number of studies with only one method listed.
than 1.25 times the interval between updates of the N- How often was exponential smoothing successful
N method. This finding was thoroughly substantiated in these studies? Forecast performance was sometimes
by simulating different inter-demand intervals and difficult to evaluate because many of the studies were
patterns, different distributions of order size, different not designed to be comparative in nature. However,
forecast horizons, and different parameters in the my interpretation is that there are only seven studies
smoothing methods. that did not report reasonable forecast accuracy with
Syntetos et al. (2005) extended Johnston and exponential smoothing, and all of these can be
Boylan’s work by developing rules based on the explained. In Holmes’ (1986) analysis of leading
variability of order size and inter-arrival times for indicator series characterized by dramatic turning
selecting from three methods: N-N, original Croston, points, it is unsurprising that transfer function models
and modified Croston in Eqs. (17), (18), and (20). performed better than the A-N method. In forecasting
Problems arise in any attempt to generalize from IBM product sales (Wu, Ravishanker, & Hosking,
Syntetos et al. because method selection based on 1991), several Box–Jenkins models defeated the A-M
variability of order size contradicts Johnston and method. The data suggest that the damped trend
Boylan (1996a, 1996b), who found that this statistic methods would have performed better, but the authors
had almost no effect on the relative performance of did not consider them.
methods. Syntetos et al. (2005) also appears to In forecasting point-of-sale scanner data (Curry,
contradict Syntetos and Boylan (2005); in Syntetos Divakar, Mathur, & Whiteman, 1995), the univariate
et al., the original Croston method was consistently N-N method was applied to a multivariate problem
better than the N-N method, but this was not true in with predictably poor results. Fildes, Randall, and
Syntetos and Boylan. Stubbs (1997) developed models for short-term
forecasting of water and gas demand and found that
complex multivariate methods (beyond the capability
8. Empirical studies of the exponential smoothing methodology) were
necessary to capture all the influences on the data.
Table 3 is a guide to all papers published since In Fildes et al. (1998), the telecommunications data
1985 that present empirical results for exponential contained little structure except consistent negative
smoothing, excluding the M-competitions, the many trends, making the robust trend method the best
re-examinations of the M-competitions, papers based choice. In Bianchi, Jarrett, and Hanumara’s (1998)
entirely on simulated time series, and several papers study of incoming calls to telemarketing centers, the
that are impossible to evaluate. This last category A-A and A-M methods did not perform as well as
includes Shoesmith and Pinder (2001), who did not ARIMA modeling with interventions that were
disclose the particular smoothing methods used. essential in the data.
Mahmoud and Pegels (1990) and Snyder (1993) were The last study in which exponential smoothing did
also omitted for reasons explained in Sections 5.4 and not perform well is by Willemain, Smart, and Schwarz
6.2, respectively. (2004), who claimed that their patented bootstrap
Several generalizations can be made about the 65 method made significant improvements in forecast
papers listed in Table 3. Seasonal methods were rarely accuracy over the N-N and Croston methods. How-
used, even though most studies were based on ever, as discussed in Gardner and Koehler (2005),
seasonal data. It may be surprising that there have Willemain et al. was published with mistakes and
been no reported applications of the N-A or N-M omissions that bias the results in favor of the patented
methods, and only three applications of the A-A method.
Table 3
Empirical studies
Data Methods Reference
Airline passengers DA-A Grubb and Mason (2001)
Ambulance demand calls A-M Baker and Fitzpatrick (1986)
Australian football margins of victory N-N Clarke (1993)
Auto parts N-N Gardner and Diaz-Saiz (2002)
Auto parts N-N, Croston Snyder (2002)
Auto parts N-N, Croston Syntetos and Boylan (2005)
Auto parts N-N, Croston Syntetos et al. (2005)
Call volumes to telemarketing centers A-A, A-M Bianchi et al. (1998)
Chemical products N-N, Croston Garcı́a-Flores et al. (2003)
Computer network services N-N Masuda and Whang (1999)
Computer parts DA-N Gardner (1993)
Confectionery equipment repair parts N-N, Croston Strijbosch et al. (2000)
Consumer product sales (annual) N-N, A-N, DA-N Schnaars (1986)
Consumer food products N-N Koehler (1985)
Cookware sales DA-N Gardner and Anderson (1997)
Cookware sales DA-N Gardner, Anderson-Fletcher, and Wicks (2001)
Crime rates N-N, A-N Gorr, Olligschlaeger, and Thompson (2003)
Currency exchange rates N-N, A-N, A-M Dheeriya and Raj (2000)
Department store sales N-N, A-N Geurts and Kelly (1986)
Economic data (various) N-N, A-N Geriner and Ord (1991)
Economic, environmental data (various) A-N Wright (1986b)
Electric utility loads A-N Huss (1985a)
Electric utility sales A-N Huss (1985b)
Electricity demand N-N, A-N Price and Sharp (1986)
Electricity demand A-M Taylor (2003b)
Electricity demand forecast errors N-N Ramanathan, Engle, Granger, Vahid-Araghi, and Brace (1997)
Electricity supply A-N Sharp and Price (1990)
Electrical service requests A-M Weintraub, Aboud, Fernandez, Laporte, and Ramirez (1999)
Electronics components N-N, A-N Flores et al. (1993)
Exports N-N Mahmoud, Motwani, and Rice (1990)
Financial futures prices N-N Sharda and Musser (1986)
Financial returns N-N Taylor (2004a)
Food product demand N-N Fairfield and Kingsman (1993)
Food product demand N-N Mercer and Tao (1996)
Hospital patient movements A-M Lin (1989)
Hotel revenue data N-N, A-N Weatherford and Kimes (2003)
IBM product sales A-M Wu et al. (1991)
Industrial data (various) N-N, Croston Willemain, Smart, Shockor, and DeSautels (1994) and
Willemain et al. (2004)
Industrial fasteners N-N, A-N Adshead and Price (1987)
Industrial production differences N-N Öller (1986)
Industrial production index A-A Bodo and Signorini (1987)
Leading indicators A-N Holmes (1986)
Macroeconomic variables A-M Thury (1985)
Mail order sales N-N Chambers and Eglese (1988)
Mail volumes A-M Thomas (1993)
Manpower retention rates A-N Chu and Lin (1994)
Medicaid expenses A-N Williams and Miller (1999)
Medical supplies A-N Mathews and Diamantopoulos (1994)
Natural gas demand N-N, A-N, A-M Lee et al. (1993)
Stock index direction N-N Leung, Daouk, and Chen (2000)
Supermarket product sales Many Taylor (2004c)
Point-of-sale scanner data N-N Curry et al. (1995)
Table 3 (continued)
Data Methods Reference
Printed banking forms N-N, A-N Chan, Kingsman, and Wong (1999)
Process industry sales DA-M Miller and Liberatore (1993)
Royal Air Force spare parts N-N, Croston Eaves and Kingsman (2004)
Telephone service times N-N Samuelson (1999)
Telecommunications demand N-N, A-N, DA-N Fildes et al. (1998)
Tourism A-A Pfeffermann and Allon (1989)
Tourism A-N Martin and Witt (1989)
Travel speeds in road networks N-N Hill and Benton (1992)
Truck sales A-M Heuts and Bronckers (1988)
US Navy inventory demands N-N, A-N, DA-N Gardner (1990)
Utility demand (water and gas) A-M, N-A, N-M Fildes et al. (1997)
Vehicle/agricultural machinery parts N-N, Croston Sani and Kingsman (1997)
Water quality, divorce rates A-N Wright (1986b)
How often was Croston’s method successful? and overall, there was little to choose between damped
There are five pertinent studies in Table 3 (Eaves & additive and multiplicative trends and the SSOE
Kingsman, 2004; Garcı́a-Flores, Wang, & Burgess, models. I cannot explain these results, and they cannot
2003; Sani & Kingsman, 1997; Snyder, 2002; be ignored if there is to be any hope of practical
Strijbosch, Heuts, & van der Schoot, 2000). In all of implementation.
these, Croston’s method or one of its variants gave A number of possibilities might be explored to
reasonable performance, although it is difficult to be improve the performance of the SSOE models. First,
more specific because the degree of success depended damping should be considered with both additive and
on the type of data and the error measure used. multiplicative trends. Not all of the 24 models in the
SSOE framework can be expected to be robust, so the
range of candidates might be reduced. The use of
9. The state of the art information criteria for model selection should be re-
examined. In Hyndman (2001), the AIC often failed
Exponential smoothing methods can be justified in to select an ARIMA (0, 1, 1) model even when the
part through equivalent kernel regression and ARIMA data were generated by an ARIMA (0, 1, 1) process. It
models, and in their entirety through the new class of seems unreasonable to believe that the AIC should do
SSOE state-space models, which have many theoret- any better in selection from the SSOE framework, and
ical advantages, most notably the ability to make the other selection procedures should be considered. We
forecast errors dependent on the other components of note that other information criteria were used in Billah
the time series. This kind of multiplicative error et al. (2005), but this study is unhelpful because the
structure is not possible with the ARIMA class, results are not benchmarked.
making exponential smoothing a much broader class Most researchers have avoided the problem of
of models, and neatly reversing the bspecial caseQ method selection in exponential smoothing, and there
argument discussed in Gardner (1985). is as yet no evidence that individual selection can
The problem now is to determine whether the SSOE improve forecast accuracy over aggregate selection of
modeling framework has practical as well as theoretical one of the damped trend methods. However, Shah’s
advantages. This has yet to be demonstrated. In the M1 (1997) discriminant analysis procedure and Meade’s
data, aggregate selection of the damped additive trend (2000) regression-based performance index are prom-
was a better choice than individual selection of SSOE ising alternatives for individual method selection and
models through information criteria. The same conclu- deserve empirical research using an exponential
sion holds for annual and quarterly M3 data. For smoothing framework.
monthly M3 data, individual selection of SSOE models From the practitioner’s viewpoint, the aim in
was superior only at short horizons. At longer horizons method selection must be robustness, especially in
large forecasting systems. Several new methods have al. (2005) boundary equations. In fitting multiplica-
demonstrated robustness: Taylor’s (2003a) damped tive seasonal models, there is little guidance on
multiplicative trends, Taylor’s (2004a, 2004b) adap- parameter choice. Research is also needed on param-
tive version of simple smoothing, and the Theta eter choice for the new damped multiplicative trend
method of Assimakopoulos and Nikolopoulos (2000), methods.
shown to be equivalent to the N-N method with drift My experience is that practitioners happily ignore
by Hyndman and Billah (2003). All of these methods most of the problems discussed in this paper. In the
deserve more research to determine when they should future, we must validate the substantial body of theory
be preferred over competing methods. in exponential smoothing and communicate it to
There are a number of other opportunities for practitioners. Writing of exponential smoothing vs.
empirical research in exponential smoothing. The the Box–Jenkins methodology, I concluded Gardner
SSOE models yield analytical variance expressions (1985) with the following opinion: bThe challenge for
for point forecasts that have eluded researchers for future research is to establish some basis for choosing
many years. Surely these expressions are better than among these and other approaches to time series
the variance expressions used in the past, but they forecasting.Q This conclusion still holds, although we
have not been evaluated with real data. Perhaps this have many more alternatives today.
could be done in something like an M-competition for
prediction intervals. Such a competition should also
Acknowledgments
test the SSOE variance expressions for cumulative
forecasts at different lead times. In forecasting
I am grateful to Chris Chatfield, Robert Fildes,
intermittent demand, we have several new versions
Anne Koehler, and James Taylor for many helpful
of Croston’s method that require further testing with
comments and suggestions. None of these people
real data. Another idea that merits empirical research
necessarily agrees with the opinions expressed in the
is Fildes et al.’s (1998) recommendation that param-
paper, and any errors that remain are my own. I also
eters be re-optimized each time forecasts are made.
wish to acknowledge my assistant, Thomas Bayless,
Since Winters (1960) appeared, there has been
of the Honors College, University of Houston, who
confusion in the literature about whether and how
prepared the bibliography and was a great help with
seasonals should be renormalized in the Holt–Winters
the manuscript.
methods. Today, it seems foolish not to renormalize
using the efficient Archibald and Koehler (2003)
system, a major practical advance that resolves References
conflicting results and puts the renormalization
equations in a common form. In the additive seasonal Adshead, N. S., & Price, D. H. R. (1987). Demand forecasting and
methods, it is not necessary to renormalize the cost performance in a model of a real manufacturing unit.
seasonal indices if forecast accuracy is the only International Journal of Production Research, 25, 1251 – 1265.
Adya, M., Collopy, F., Armstrong, J. S., & Kennedy, M. (2001).
concern, but this is rarely the case in practice when Automatic identification of time series features for rule-based
repetitive forecasts are made over time. Forecasting forecasting. International Journal of Forecasting, 17, 143 – 157.
methods require regular maintenance, a job that is Akaike, H. (1970). Statistical predictor identification. Annals of the
easier to accomplish when the method components Institute of Statistical Mathematics, 22, 203 – 217.
Aldrin, M., & Damsleth, E. (1989). Forecasting non-seasonal time
can be interpreted without bias. With multiplicative
series with missing observations. Journal of Forecasting, 8,
seasonality, we do not know if renormalization can be 97 – 116.
safely ignored, so certainly we should use the Anderson, J. R. (1994). Simpler exponentially weighted moving
Archibald-Koehler system. averages with irregular updating periods. Journal of the
In fitting additive seasonal models, it is alarming Operational Research Society, 45, 486.
that some combinations of [0, 1] parameters fall Andreassen, P. B., & Kraus, S. J. (1990). Judgmental extrapolation
and the salience of change. Journal of Forecasting, 9, 347 – 372.
within the ARIMA invertible region, yet the weights Andrews, R. L. (1994). Forecasting performance of structural time
on past data diverge. This problem can be avoided by series models. Journal of Business and Economic Statistics, 12,
checking the weights using the Hyndman, Akram et 129 – 133.
Archibald, B. C. (1990). Parameter space of the Holt–Winters’ Carreno, J. J., & Madinaveitia, J. (1990). A modification of time
model. International Journal of Forecasting, 6, 199 – 209. series forecasting methods for handling announced price
Archibald, B. C., & Koehler, A. B. (2003). Normalization of increases. International Journal of Forecasting, 6, 479 – 484.
seasonal factors in Winters’ methods. International Journal of Chambers, M. L., & Eglese, R. W. (1988). Forecasting demand for
Forecasting, 19, 143 – 148. mail order catalogue lines during the season. European Journal
Arinze, B. (1994). Selecting appropriate forecasting models using of Operational Research, 34, 131 – 138.
rule induction. Omega, 22, 647 – 658. Chan, C. K., Kingsman, B. G., & Wong, H. (1999). The value of
Armstrong, J. S. (1984). Forecasting by extrapolation: Conclusions combining forecasts in inventory management—A case study
from 25 years of research. Interfaces, 14, 52 – 66. in banking. European Journal of Operational Research, 117,
Artto, K. A., & Pylkkänen, E. (1999). An effective procedure for the 199 – 210.
distribution of magazines. International Transactions in Oper- Chandra, C., & Grabis, J. (2005). Application of multiple-steps
ational Research, 6, 289 – 310. forecasting for restraining the bullwhip effect and improving
Assimakopoulos, V., & Nikolopoulos, K. (2000). The theta model: inventory performance under autoregressive demand. Europe-
A decomposition approach to forecasting. International Journal an Journal of Operational Research, 166, 337 – 350.
of Forecasting, 16, 521 – 530. Chatfield, C. (1988). What is the dbest methodT of forecasting?
Baker, J. R., & Fitzpatrick, K. E. (1986). Determination of an Journal of Applied Statistics, 15, 19 – 38.
optimal forecast model for ambulance demand using goal Chatfield, C. (1993). Calculating interval forecasts. Journal of
programming. Journal of the Operational Research Society, Business and Economic Statistics, 11, 121 – 135.
37, 1047 – 1059. Chatfield, C. (1995). Model uncertainty, data mining and statistical
Bartolomei, S. M., & Sweet, A. L. (1989). A note on a comparison inference. Journal of the Royal Statistical Society. Series A, 158,
of exponential smoothing methods for forecasting seasonal 419 – 466.
series. International Journal of Forecasting, 5, 111 – 116. Chatfield, C. (1996). Model uncertainty and forecast accuracy.
Bell, P. C. (1978). A new procedure for the distribution of Journal of Forecasting, 15, 495 – 508.
periodicals. Journal of the Operational Research Society, 29, Chatfield, C. (1997). Forecasting in the 1990s. Journal of the Royal
427 – 434. Statistical Society. Series D, 46, 461 – 473.
Bell, P. C. (1981). Adaptive sales forecasting with many Chatfield, C. (2002). Confessions of a pragmatic statistician.
stockouts. Journal of the Operational Research Society, 32, Journal of the Royal Statistical Society. Series D, 51, 1 – 20.
865 – 873. Chatfield, C. (2004). The analysis of time series: An introduction
Bell, P. C. (2000). Forecasting demand variation when there are (6th edition). Boca Raton7 Chapman & Hall/CRC Press.
stockouts. Journal of the Operational Research Society, 51, Chatfield, C., & Koehler, A. B. (1991). On confusing lead time
358 – 363. demand with h-period-ahead forecasts. International Journal of
Bianchi, L., Jarrett, J., & Hanumara, R. C. (1998). Improving Forecasting, 7, 239 – 240.
forecasting for telemarketing centers by ARIMA modeling Chatfield, C., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2001). A
with intervention. International Journal of Forecasting, 14, new look at models for exponential smoothing. Journal of the
497 – 504. Royal Statistical Society. Series D, 50, 147 – 159.
Billah, B., Hyndman, R. J., & Koehler, A. B. (2005). Empirical Chatfield, C., & Yar, M. (1988). Holt–Winters forecasting: Some
information criteria for time series forecasting model selec- practical issues. Journal of the Royal Statistical Society. Series
tion. Journal of Statistical Computation and Simulation, 75, D, 37, 129 – 140.
831 – 840. Chatfield, C., & Yar, M. (1991). Prediction intervals for multipli-
Bodo, G., & Signorini, L. F. (1987). Short-term forecasting of the cative Holt–Winters. International Journal of Forecasting, 7,
industrial production index. International Journal of Forecast- 31 – 37.
ing, 3, 245 – 259. Chen, C. (1997). Robustness properties of some forecasting
Bossons, J. (1966). The effects of parameter misspecification and methods for seasonal time series: A Monte Carlo study.
non-stationary on the applicability of adaptive forecasts. International Journal of Forecasting, 13, 269 – 280.
Management Science, 12, 659 – 669. Chu, S. C. K., & Lin, C. K. Y. (1994). Cohort analysis technique for
Bowerman, B. L., O’Connell, R., & Koehler, A. B. (2005). long-term manpower planning: The case of a Hong Kong tertiary
Forecasting, time series, and regression (4th edition). Pacific institution. Journal of the Operational Research Society, 45,
Grove, CA7 Duxbury Press. 696 – 709.
Brown, R. G. (1959). Statistical forecasting for inventory control. Cipra, T. (1992). Robust exponential smoothing. Journal of
New York7 McGraw-Hill. Forecasting, 11, 57 – 69.
Brown, R. G. (1963). Smoothing, forecasting and prediction of Cipra, T., Trujillo, J., & Rubio, A. (1995). Holt–Winters method with
discrete time series. Englewood Cliffs, NJ7 Prentice-Hall. missing observations. Management Science, 41, 174 – 178.
Brown, R. G. (1967). Decision rules for inventory management. Clarke, S. R. (1993). Computer forecasting of Australian rules
New York7 Holt, Rinehart, and Winston. football for a daily newspaper. Journal of the Operational
Broze, L., & Mélard, G. (1990). Exponential smoothing: Research Society, 44, 753 – 759.
Estimation by maximum likelihood. Journal of Forecasting, 9, Cogger, K. O. (1973). Specification analysis. Journal of the
445 – 455. American Statistical Association, 68, 899 – 905.
Cohen, G. D. (1963). A note on exponential smoothing and Flores, B. E., & Pearce, S. L. (2000). The use of an expert system in
autocorrelated inputs. Operations Research, 11, 361 – 366. the M3 competition. International Journal of Forecasting, 16,
Collopy, F., & Armstrong, J. S. (1992). Rule-based forecasting: 485 – 496.
Development and validation of an expert systems approach to Garcı́a-Flores, R., Wang, X. Z., & Burgess, T. F. (2003). Tuning
combining time series extrapolations. Management Science, 38, inventory policy parameters in a small chemical company.
1394 – 1414. Journal of the Operational Research Society, 54, 350 – 361.
Cox, D. R. (1961). Prediction by exponentially weighted moving Gardner Jr., E. S. (1983). Automatic monitoring of forecast errors.
averages and related methods. Journal of the Royal Statistical Journal of Forecasting, 2, 1 – 21.
Society. Series B, 23, 414 – 422. Gardner Jr., E. S. (1985). Exponential smoothing: The state of the
Croston, J. D. (1972). Forecasting and stock control for intermittent art. Journal of Forecasting, 4, 1 – 28.
demands. Operational Research Quarterly, 23, 289 – 303. Gardner Jr., E. S. (1988). A simple method of computing prediction
Curry, D. J., Divakar, S., Mathur, S. K., & Whiteman, C. H. (1995). intervals for time series forecasts. Management Science, 34,
BVAR as a category management tool: An illustration and 541 – 546.
comparison with alternative techniques. Journal of Forecasting, Gardner Jr., E. S. (1990). Evaluating forecast performance in an
14, 181 – 199. inventory control system. Management Science, 36, 490 – 499.
Dejonckheere, J., Disney, S. M., Lambrecht, M. R., & Towill, D. R. Gardner Jr., E. S. (1993). Forecasting the failure of component parts
(2003). Measuring and avoiding the bullwhip effect: A control in computer systems: A case study. International Journal of
theoretic approach. European Journal of Operational Research, Forecasting, 9, 245 – 253.
147, 567 – 590. Gardner Jr., E. S. (1999). Note: Rule-based forecasting vs.
Dejonckheere, J., Disney, S. M., Lambrecht, M. R., & Towill, D. R. damped-trend exponential smoothing. Management Science,
(2004). The impact of information enrichment on the Bullwhip effect 45, 1169 – 1176.
in supply chains: A control engineering perspective. European Gardner Jr., E. S., & Anderson, E. A. (1997). Focus
Journal of Operational Research, 153, 727 – 750. forecasting reconsidered. International Journal of Forecasting,
Dheeriya, P. L., & Raj, M. (2000). An investigation in exchange 13, 501 – 508.
rate behavior of emerging countries. International Journal of Gardner Jr., E. S., Anderson-Fletcher, E. A., & Wicks, A. M.
Public Administration, 23, 1089 – 1112. (2001). Further results on focus forecasting vs. exponential
Eaves, A. H. C., & Kingsman, B. G. (2004). Forecasting for the smoothing. International Journal of Forecasting, 17, 287 – 293.
ordering and stock-holding of spare parts. Journal of the Gardner Jr., E. S., & Diaz-Saiz, J. (2002). Seasonal adjustment of
Operational Research Society, 55, 431 – 437. inventory demand series: A case study. International Journal of
Enns, P. G., Machak, J. A., Spivey, W. A., & Wrobleski, W. J. (1982). Forecasting, 18, 117 – 123.
Forecasting applications of an adaptive multiple exponential Gardner Jr., E. S., & Koehler, A. B. (2005). Comments on a
smoothing model. Management Science, 28, 1035 – 1044. patented forecasting method for intermittent demand. Interna-
Fairfield, R. P., & Kingsman, B. G. (1993). Control theory in tional Journal of Forecasting, 21, 617 – 618.
production/inventory systems: A case study in a food processing Gardner Jr., E. S., & McKenzie, E. (1985). Forecasting trends in
organization. Journal of the Operational Research Society, 44, time series. Management Science, 31, 1237 – 1246.
1173 – 1182. Gardner Jr., E. S., & McKenzie, E. (1988). Model identification in
Farnum, N. R. (1992). Exponential smoothing: Behavior of the ex- exponential smoothing. Journal of the Operational Research
post sum of squares near 0 and 1. Journal of Forecasting, 11, Society, 39, 863 – 867.
47 – 56. Gardner Jr., E. S., & McKenzie, E. (1989). Seasonal exponential
Fildes, R. (1992). The evaluation of extrapolative forecasting smoothing with damped trends. Management Science, 35,
methods. International Journal of Forecasting, 8, 81 – 98. 372 – 376.
Fildes, R. (2001). Beyond forecasting competitions. International Gass, S. I., & Harris, C. M. (Eds.). (2000). Encyclopedia of
Journal of Forecasting, 17, 556 – 560. operations research and management science (Centennial
Fildes, R., & Beard, C. (1992). Forecasting systems for production edition). Dordrecht, The Netherlands7 Kluwer.
and inventory control. International Journal of Operations and Geriner, P. T., & Ord, J. K. (1991). Automatic forecasting using
Production Management, 12, 4 – 27. explanatory variables: A comparative study. International
Fildes, R., Hibon, M., Makridakis, S., & Meade, N. (1998). Journal of Forecasting, 7, 127 – 140.
Generalising about univariate forecasting methods: Further Geurts, M. D., & Kelly, J. P. (1986). Forecasting retail sales using
empirical evidence. International Journal of Forecasting, 14, alternative models. International Journal of Forecasting, 2,
339 – 358. 261 – 272.
Fildes, R., Randall, A., & Stubbs, P. (1997). One day ahead demand Gijbels, I., Pope, A., & Wand, M. P. (1999). Understanding
forecasting in the utility industries: Two case studies. Journal of exponential smoothing via kernel regression. Journal of the
the Operational Research Society, 48, 15 – 24. Royal Statistical Society. Series B, 61, 39 – 50.
Flores, B. E., Olson, D. L., & Pearce, S. L. (1993). Use of cost and Gilchrist, W. W. (1976). Statistical forecasting. London7 Wiley.
accuracy measures in forecasting method selection: A physical Gorr, W., Olligschlaeger, A., & Thompson, Y. (2003). Short-term
distribution example. International Journal of Production forecasting of crime. International Journal of Forecasting, 19,
Research, 31, 139 – 160. 579 – 594.
Grubb, H., & Mason, A. (2001). Long lead-time forecasting of UK Hyndman, R. J., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2005).
air passengers by Holt–Winters methods with damped trend. Prediction intervals for exponential smoothing using two
International Journal of Forecasting, 17, 71 – 82. new classes of state space models. Journal of Forecasting, 24,
Harrison, P. J. (1967). Exponential smoothing and short-term sales 17 – 37.
forecasting. Management Science, 13, 821 – 842. Johnston, F. R. (1993). Exponentially weighted moving average
Harvey, A. C. (1984). A unified view of statistical forecasting (EWMA) with irregular updating periods. Journal of the
procedures. Journal of Forecasting, 3, 245 – 275. Operational Research Society, 44, 711 – 716.
Harvey, A. C. (1986). Analysis and generalisation of a multi- Johnston, F. R. (2000). Viewpoint: Lead time demand adjustments
variate exponential smoothing model. Management Science, 32, or when a model is not a model. Journal of the Operational
374 – 380. Research Society, 51, 1107 – 1110.
Harvey, A. C., & Koopman, S. J. (2000). Signal extraction and the Johnston, F. R., & Boylan, J. E. (1994). How far ahead can an
formulation of unobserved components models. Econometrics EWMA model be extrapolated? Journal of the Operational
Journal, 3, 84 – 107. Research Society, 45, 710 – 713.
Harvey, A. C., & Snyder, R. D. (1990). Structural time series Johnston, F. R., & Boylan, J. E. (1996a). Forecasting for items with
models in inventory control. International Journal of Forecast- intermittent demand. Journal of the Operational Research
ing, 6, 187 – 198. Society, 47, 113 – 121.
Heuts, R. M. J., & Bronckers, J.H.J.M. (1988). Forecasting the Johnston, F. R., & Boylan, J. E. (1996b). Forecasting
Dutch heavy truck market: A multivariate approach. Interna- intermittent demand: A comparative evaluation of Croston’s
tional Journal of Forecasting, 4, 57 – 79. method. Comment. International Journal of Forecasting, 12,
Hill, A. V., & Benton, W. C. (1992). Modelling intra-city time- 297 – 298.
dependent travel speeds for vehicle scheduling problems. Johnston, F. R., & Harrison, P. J. (1986). The variance of lead-
Journal of the Operational Research Society, 43, 343 – 351. time demand. Journal of the Operational Research Society, 37,
Holmes, R. A. (1986). Leading indicators of industrial employment 303 – 309.
in British Columbia. International Journal of Forecasting, 2, Jones, R. H. (1966). Exponential smoothing for multivariate
87 – 100. time series. Journal of the Royal Statistical Society. Series B,
Holt, C. C. (1957). Forecasting seasonals and trends by exponen- 28, 241 – 251.
tially weighted moving averages. ONR Memorandum, vol. 52. Jun, D. B. (1989). On detecting and estimating a major level or
Pittsburgh, PA7 Carnegie Institute of Technology Available from slope change in general exponential smoothing. Journal of
the Engineering Library, University of Texas at Austin. Forecasting, 8, 55 – 64.
Holt, C. C. (2004a). Forecasting seasonals and trends by exponen- Jun, D. B., & Oliver, R. M. (1985). Bayesian forecasts following a
tially weighted moving averages. International Journal of major level change in exponential smoothing. Journal of
Forecasting, 20, 5 – 10. Forecasting, 4, 293 – 302.
Holt, C. C. (2004b). Author’s retrospective on dForecasting Kirkendall, N. J. (1992). Monitoring for outliers and level shifts in
seasonals and trends by exponentially weighted moving Kalman Filter implementations of exponential smoothing.
averagesT. International Journal of Forecasting, 20, 11 – 13. Journal of Forecasting, 11, 543 – 560.
Holt, C. C., Modigliani, F., Muth, J. F., & Simon, H. A. (1960). Koehler, A. B. (1985). Simple vs. complex extrapolation models.
Planning production, inventories, and work force. Englewood International Journal of Forecasting, 1, 63 – 68.
Cliffs, NJ7 Prentice-Hall. Koehler, A. B. (1990). An inappropriate prediction interval.
Huss, W. R. (1985a). Comparative analysis of company forecasts International Journal of Forecasting, 6, 557 – 558.
and advanced time series techniques using annual electric Koehler, A. B., & Murphree, E. S. (1988). A comparison of
utility energy sales data. International Journal of Forecasting, results from state space forecasting with forecasts from the
1, 217 – 239. Makridakis competition. International Journal of Forecasting,
Huss, W. R. (1985b). Comparative analysis of load forecasting 4, 45 – 55.
techniques at a southern utility. Journal of Forecasting, 4, Koehler, A. B., Snyder, R. D., & Ord, J. K. (2001). Forecasting
99 – 107. models and prediction intervals for the multiplicative Holt–
Hyndman, R. J. (2001). It’s time to move from dwhatT to dwhyT. Winters method. International Journal of Forecasting, 17,
International Journal of Forecasting, 17, 567 – 570. 269 – 286.
Hyndman, R. J., & Billah, B. (2003). Unmasking the Theta method. Lawton, R. (1998). How should additive Holt–Winters esti-
International Journal of Forecasting, 19, 287 – 290. mates be corrected? International Journal of Forecasting,
Hyndman, R. J., Koehler, A. B., Snyder, R. D., & Grose, S. (2002). 14, 393 – 403.
A state space framework for automatic forecasting using Lee, T. S., Cooper, F. W., & Adam Jr., E. E. (1993). The effects of
exponential smoothing methods. International Journal of forecasting errors on the total cost of operations. Omega, 21,
Forecasting, 18, 439 – 454. 541 – 550.
Hyndman, R. J., Akram, M., & Archibald, B., (2005). The Lee, T. S., Feller, S. J., & Adam Jr., E. E. (1992). Applying
admissible parameter space for exponential smoothing models, contemporary forecasting and computer technology for compet-
Working paper. Department of Econometrics and Business itive advantage in service operations. International Journal of
Statistics, Monash University, VIC 3800, Australia. Operations and Production Management, 12, 28 – 42.
Leung, M. T., Daouk, H., & Chen, A. -S. (2000). Forecasting stock Mercer, A., & Tao, X. (1996). Alternative inventory and distribution
indices: A comparison of classification and level estimation policies of a food manufacturer. Journal of the Operational
models. International Journal of Forecasting, 16, 173 – 190. Research Society, 47, 755 – 765.
Levén, E., & Segerstedt, A. (2004). Inventory control with a Miller, T., & Liberatore, M. (1993). Seasonal exponential smooth-
modified Croston procedure and Erlang distribution. Interna- ing with damped trends: An application for production planning.
tional Journal of Production Economics, 90, 361 – 367. International Journal of Forecasting, 9, 509 – 515.
Lin, W. T. (1989). Modeling and forecasting hospital patient Muth, J. F. (1960). Optimal properties of exponentially weighted
movements: Univariate and multiple time series approaches. forecasts. Journal of the American Statistical Association, 55,
International Journal of Forecasting, 5, 195 – 208. 299 – 306.
Mahmoud, E., Motwani, J., & Rice, G. (1990). Forecasting US Nerlove, M., & Wage, S. (1964). On the optimality of adaptive
exports: An illustration using time series and econometric forecasting. Management Science, 10, 207 – 224.
models. Omega, 18, 375 – 382. Newbold, P. (1988). Predictors projecting linear trend plus seasonal
Mahmoud, E., & Pegels, C. C. (1990). An approach for selecting dummies. Journal of the Royal Statistical Society. Series D, 37,
times series forecasting models. International Journal of 111 – 127.
Operations and Production Management, 10, 50 – 60. Newbold, P., & Bos, T. (1989). On exponential smoothing and the
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., & assumption of deterministic trend plus white noise data-generat-
Lewandowski, R. (1982). The accuracy of extrapolation (time ing models. International Journal of Forecasting, 5, 523 – 527.
series) methods: Results of a forecasting competition. Journal of Öller, L.-E. (1986). A note on exponentially smoothed seasonal
Forecasting, 1, 111 – 153. differences. Journal of Business and Economic Statistics, 4,
Makridakis, S., Chatfield, C., Hibon, M., Lawrence, M., Mills, T., & 485 – 489.
Ord, J. K. (1993). The M2-competition: A real-time judgmen- Ord, J. K. (2004). Charles Holt’s report on exponentially weighted
tally based forecasting study. International Journal of Fore- moving averages: An introduction and appreciation. Interna-
casting, 9, 5 – 22. tional Journal of Forecasting, 20, 1 – 3.
Makridakis, S., & Hibon, M. (1991). Exponential smoothing: The Ord, J. K., Koehler, A. B., & Snyder, R. D. (1997). Estimation
effect of initial values and loss functions on post-sample and prediction for a class of dynamic nonlinear statistical
forecasting accuracy. International Journal of Forecasting, 7, models. Journal of the American Statistical Association, 92,
317 – 330. 1621 – 1629.
Makridakis, S., & Hibon, M. (2000). The M3-competition: Results, Ord, J. K., Snyder, R. D., Koehler, A. B., Hyndman, R. J., & Leeds,
conclusions and implications. International Journal of Fore- M. (2005). Time series forecasting: The case for the single
casting, 16, 451 – 476. source of error state space approach, Working paper. Depart-
Martin, C. A., & Witt, S. F. (1989). Forecasting tourism demand: A ment of Econometrics and Business Statistics, Monash Univer-
comparison of the accuracy of several quantitative methods. sity, VIC 3800, Australia.
International Journal of Forecasting, 5, 7 – 19. Pandit, S. M., & Wu, S. M. (1974). Exponential smoothing as a
Masuda, Y., & Whang, S. (1999). Dynamic pricing for network special case of a linear stochastic system. Operations Research,
service: Equilibrium and stability. Management Science, 45, 22, 868 – 879.
857 – 869. Pandit, S. M., & Wu, S. M. (1983). Time series and systems analysis
Mathews, B. P., & Diamantopoulos, A. (1994). Towards a with applications. New York7 Wiley.
taxonomy of forecast error measures: A factor-comparative Pantazopoulos, S. N., & Pappis, C. P. (1996). A new adaptive
investigation of forecast error dimensions. Journal of Forecast- method for extrapolative forecasting algorithms. European
ing, 13, 409 – 416. Journal of Operational Research, 94, 106 – 111.
McClain, J. O. (1974). Dynamics of exponential smoothing with trend Pegels, C. (1969). Exponential forecasting: Some new variations.
and seasonal terms. Management Science, 20, 1300 – 1304. Management Science, 15, 311 – 315.
McClain, J. O., & Thomas, L. J. (1973). Response-variance Pfeffermann, D., & Allon, J. (1989). Multivariate exponential
tradeoffs in adaptive forecasting. Operations Research, 21, smoothing: Method and practice. International Journal of
554 – 568. Forecasting, 5, 83 – 98.
McKenzie, E. (1986). Error analysis for Winters’ additive seasonal Price, D. H. R., & Sharp, J. A. (1986). A comparison of the
forecasting system. International Journal of Forecasting, 2, performance of different univariate forecasting methods in a
373 – 382. model of capacity acquisition in UK electricity supply.
Meade, N. (2000). Evidence for the selection of forecasting International Journal of Forecasting, 2, 333 – 348.
methods. Journal of Forecasting, 19, 515 – 535. Proietti, T. (1998). Seasonal heteroscedasticity and trends. Journal
Mentzer, J. T. (1988). Forecasting with adaptive extended expo- of Forecasting, 17, 1 – 17.
nential smoothing. Journal of the Academy of Marketing Proietti, T. (2000). Comparing seasonal components for structural
Science, 16, 62 – 70. time series models. International Journal of Forecasting, 16,
Mentzer, J. T., & Gomes, R. (1994). Further extensions of adaptive 247 – 260.
extended exponential smoothing and comparison with the M- Ramanathan, R., Engle, R., Granger, C. W. J., Vahid-Araghi, F., &
competition. Journal of the Academy of Marketing Science, 22, Brace, C. (1997). Short-run forecasts of electricity loads and
372 – 382. peaks. International Journal of Forecasting, 13, 161 – 174.
Rao, A. (1973). A comment on Forecasting and stock control for Snyder, R. D., Koehler, A. B., Hyndman, R. J., & Ord, J. K. (2004).
intermittent demands. Operational Research Quarterly, 24, Exponential smoothing models: Means and variances for lead-
639 – 640. time demand. European Journal of Operational Research, 158,
Rasmussen, R. (2004). On time series data and optimal parameters. 444 – 455.
Omega, 32, 111 – 120. Snyder, R. D., Koehler, A. B., & Ord, J. K. (1999). Lead time
Roberts, S. A. (1982). A general class of Holt–Winters type demand for simple exponential smoothing: An adjustment factor
forecasting models. Management Science, 28, 808 – 820. for the standard deviation. Journal of the Operational Research
Rosanna, R. J., & Seater, J. J. (1995). Temporal aggregation and Society, 50, 1079 – 1082.
economic time series. Journal of Business and Economic Snyder, R. D., Koehler, A. B., & Ord, J. K. (2000). Viewpoint: A
Statistics, 13, 441 – 451. reply to Johnston. Journal of the Operational Research Society,
Rosas, A. L., & Guerrero, V. M. (1994). Restricted forecasts using 51, 1108 – 1110.
exponential smoothing techniques. International Journal of Snyder, R. D., Koehler, A. B., & Ord, J. K. (2002). Forecasting for
Forecasting, 10, 515 – 527. inventory control with exponential smoothing. International
Samuelson, D. A. (1999). Predictive dialing for outbound telephone Journal of Forecasting, 18, 5 – 18.
call centers. Interfaces, 29, 66 – 81. Snyder, R. D., Ord, J. K., & Koehler, A. B. (2001). Prediction
Sani, B., & Kingsman, B. G. (1997). Selecting the best periodic intervals for ARIMA models. Journal of Business and
inventory control and demand forecasting methods for low Economic Statistics, 19, 217 – 225.
demand items. Journal of the Operational Research Society, 48, Snyder, R. D., & Shami, R. G. (2001). Exponential smoothing of
700 – 713. seasonal data: A comparison. Journal of Forecasting, 20, 197– 202.
Satchell, S., & Timmermann, A. (1995). On the optimality of Strijbosch, L. W. G., Heuts, R. M. J., & van der Schoot, E. H. M.
adaptive expectations: Muth revisited. International Journal of (2000). A combined forecast-inventory control procedure for
Forecasting, 11, 407 – 416. spare parts. Journal of the Operational Research Society, 51,
Schnaars, S. P. (1986). A comparison of extrapolation models on 1184 – 1192.
yearly sales forecasts. International Journal of Forecasting, 2, Sweet, A. L. (1985). Computing the variance of the forecast error
71 – 85. for the Holt–Winters seasonal models. Journal of Forecasting,
Schwarz, G. (1978). Estimating the dimension of a model. The 4, 235 – 243.
Annals of Statistics, 6, 461 – 464. Syntetos, A. A., & Boylan, J. E. (2001). On the bias of intermittent
Segura, J. V., & Vercher, E. (2001). A spreadsheet modeling demand estimates. International Journal of Production Eco-
approach to the Holt–Winters optimal forecasting. European nomics, 71, 457 – 466.
Journal of Operational Research, 131, 375 – 388. Syntetos, A. A., & Boylan, J. E. (2005). The accuracy of
Shah, C. (1997). Model selection in univariate time series intermittent demand estimates. International Journal of Fore-
forecasting using discriminant analysis. International Journal casting, 21, 303 – 314.
of Forecasting, 13, 489 – 500. Syntetos, A. A., Boylan, J. E., & Croston, J. D. (2005). On the
Sharda, R., & Musser, K. D. (1986). Financial futures hedging via categorization of demand patterns. Journal of the Operational
goal programming. Management Science, 32, 933 – 947. Research Society, 56, 495 – 503.
Sharp, J. A., & Price, D. H. R. (1990). Experience curve models in Tashman, L. J., & Kruk, J. M. (1996). The use of protocols to select
the electricity supply industry. International Journal of Fore- exponential smoothing procedures: A reconsideration of fore-
casting, 6, 531 – 540. casting competitions. International Journal of Forecasting, 12,
Shenstone, L., & Hyndman, R. J. (2005). Stochastic models 235 – 253.
underlying Croston’s method for intermittent demand forecast- Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy:
ing. Journal of Forecasting, 24, 389 – 402. An analysis and review. International Journal of Forecasting,
Shoesmith, G. L., & Pinder, J. P. (2001). Potential inventory 16, 437 – 450.
cost reductions using advanced time series forecasting Taylor, J. W. (2003a). Exponential smoothing with a damped
techniques. Journal of the Operational Research Society, 52, multiplicative trend. International Journal of Forecasting, 19,
1267 – 1275. 715 – 725.
Snyder, R. D. (1985). Recursive estimation of dynamic linear Taylor, J. W. (2003b). Short-term electricity demand forecasting
models. Journal of the Royal Statistical Society. Series B, 47, using double seasonal exponential smoothing. Journal of the
272 – 276. Operational Research Society, 54, 799 – 805.
Snyder, R. D. (1988). Progressive tuning of simple exponential Taylor, J. W. (2004a). Volatility forecasting with smooth transition
smoothing forecasts. Journal of the Operational Research exponential smoothing. International Journal of Forecasting,
Society, 39, 393 – 399. 20, 273 – 286.
Snyder, R. D. (1993). A computerised system for forecasting spare Taylor, J. W. (2004b). Smooth transition exponential smoothing.
parts sales: A case study. International Journal of Operations Journal of Forecasting, 23, 385 – 404.
and Production Management, 13, 83 – 92. Taylor, J. W. (2004c). Forecasting supermarket sales with
Snyder, R. D. (2002). Forecasting sales of slow and fast moving exponentially weighted quantile regression, Working paper.
inventories. European Journal of Operational Research, 140, Said Business School, University of Oxford, Park End St.,
684 – 699. Oxford OX1 1HP, UK.
Taylor, J. W., & Bunn, D. W. (1999). A quantile regression Willemain, T. R., Smart, C. N., Shockor, J. H., & DeSautels, P. A.
approach to generating prediction intervals. Management (1994). Forecasting intermittent demand in manufacturing: A
Science, 45, 225 – 237. comparative evaluation of Croston’s method. International
Theil, H., & Wage, S. (1964). Some observations on adaptive Journal of Forecasting, 10, 529 – 538.
forecasting. Management Science, 10, 198 – 206. Williams, D. W., & Miller, D. (1999). Level-adjusted exponential
Thomas, R. J. (1993). Method and situational factors in sales smoothing for modeling planned discontinuities. International
forecast accuracy. Journal of Forecasting, 12, 69 – 77. Journal of Forecasting, 15, 273 – 289.
Thury, G. (1985). Macroeconomic forecasting in Austria: An Williams, T. M. (1987). Adaptive Holt–Winters forecasting. Journal
analysis of accuracy. International Journal of Forecasting, 1, of the Operational Research Society, 38, 553 – 560.
111 – 121. Winters, P. R. (1960). Forecasting sales by exponentially weighted
Tiao, G. C., & Xu, D. (1993). Robustness of maximum likelihood moving averages. Management Science, 6, 324 – 342.
estimates for multi-step predictions: The exponential smoothing Wright, D. J. (1986a). Forecasting data published at irregular time
case. Biometrika, 80, 623 – 641. intervals using an extension of Holt’s method. Management
Vokurka, R. J., Flores, B. E., & Pearce, S. L. (1996). Automatic Science, 32, 499 – 510.
feature identification and graphical support in rule-based Wright, D. J. (1986b). Forecasting irregularly spaced data: An
forecasting: A comparison. International Journal of Forecast- extension of double exponential smoothing. Computers &
ing, 12, 495 – 512. Industrial Engineering, 10, 135 – 147.
Walton, J. H. D. (1994). Inventory control—A soft approach? Wu, L. S.-Y., Ravishanker, N., & Hosking, J. R. M. (1991).
Journal of the Operational Research Society, 45, 485 – 486. Forecasting for business planning: A case study of IBM product
Weatherford, L. R., & Kimes, S. E. (2003). A comparison of sales. Journal of Forecasting, 10, 579 – 595.
forecasting methods for hotel revenue management. Interna- Yar, M., & Chatfield, C. (1990). Prediction intervals for the Holt–
tional Journal of Forecasting, 19, 401 – 415. Winters forecasting procedure. International Journal of Fore-
Weintraub, A., Aboud, J., Fernandez, C., Laporte, G., & Ramirez, E. casting, 6, 127 – 137.
(1999). An emergency vehicle dispatching system for an electric Zhang, X. (2004). The impact of forecasting methods on the
utility in Chile. Journal of the Operational Research Society, 50, bullwhip effect. International Journal of Production Economics,
690 – 696. 88, 15 – 27.
Willemain, T. R., Smart, C. N., & Schwarz, H. F. (2004). A new Zhao, X., Xie, J., & Leung, J. (2002). The impact of forecasting model
approach to forecasting intermittent demand for service parts selection on the value of information sharing in a supply chain.
inventories. International Journal of Forecasting, 20, 375 – 387. European Journal of Operational Research, 142, 321 – 344.

Exponential Smmothing

Uploaded by

Copyright:

Available Formats

Exponential Smmothing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exponential Smmothing

Uploaded by

Copyright:

Available Formats

International Journal of Forecasting 22 (2006) 637 – 666

Exponential smoothing: The state of the art—Part II

1. Introduction been turned on its head, and today we know that

S t = S t1 + T t1 + ae t S t = S t1 + T t1 + ae t S t = S t 1 + T t1 + ae t /I tp

S t = S t1 + /T t 1 + ae t S t = S t1 + /T t 1 + ae t S t = S t 1 + /T t1 + ae t /I tp

S t = S t1R t1 + ae t S t = S t1R t1 + ae t S t = S t 1R t1 + ae t /I tp

6.2. Adaptive smoothing the one-step-ahead error. The practical consequence is

You might also like