Chapter 5 Exponential Smoothing Methods L 2015
Chapter 5 Exponential Smoothing Methods L 2015
Chapter 5 Exponential Smoothing Methods L 2015
5.1 Introduction
In the first four chapters we used simple and multiple regression models to explain and forecast the
future movements of one or more variables.
In this chapter we are again interested in construction models and using them for forecasting, but these
models no longer predict future movements in a variable by relating it to a set of other variables;
instead, we base our prediction solely on the past behavior of that variable.
Cross-Section Data versus Time-Series Data
Based on the time over which the data are collected, data can be classified as either cross-section data
or time-series data.
Cross-section data
Data collected on different elements at the same point in time or for the same period of time.
Example: Total population of each state of Malaysia in year 2000.
Time-series data
Data collected on the same element for the same variable at different points in time or for different
periods of time.
Example: New life insurance policies purchased between 1995 and 2000.
A time series usually consists of four components:
1.
Trend (or Secular Trend), T
In time-series analysis, the data may be taken every hour, day, week, month or year or at any
other regular interval. Although time series data generally fluctuates randomly, it may still
show gradual shifts or movements and this is referred as trend. The trend of a time series is the
underlying long-term movement or tendency of the data. The duration of trend is more than
one year and its fluctuation is due to factors which change slowly over a long stretch of time.
The trend does not always show a linear pattern. However, the trend in time series is, in
general, represented by a smooth graph.
2.
Cyclical Variations, C
The cyclical variations are long-term cyclic movement of the data. The patterns of change
occur repetitively over duration of more than one year. The long-term cyclic movement is due
to the effect or influence of business or economic conditions which are irregular in length and
amplitude.
3.
Seasonal Variations, S
Seasonal variation is the term used to describe patterns of change that recur over short period of
time. It is a short-term cyclic movement of the data. The duration is usually less than one year.
Season in this case may mean a period of quarter of a month, or even a day as in the case of
foreign exchange rate.
4.
Irregular Variations, I
Irregular variations are random variations other than those that can be accounted for by the
trend, seasonal, or cyclic variations. The changes occur in an unpredictable manner. Bad
weather, illness, strikes and riots are examples of random factors that may occur at any time of
the day.
Chapter 5 - 1
20
18
16
14
12
1
10 11 12
time
time series value
trend value
Chapter 5 - 2
...
c.
e.
f.
Time
Yt 2 Yt 1 Yt
Time
d.
Ft +1
Ft + 2
Ft 2 Ft 1 Ft
Ft + m
Time
Time
Fitting errors
( Yt n +1 Ft n +1 ),,( Yt 1 Ft 1 ), ( Yt Ft )
(Past data Fitted value in the past),..
Forecasting Errors (when Yt +1 , Yt + 2 , etc, become available)
( Yt +1 Ft +1 ),( Yt + 2 Ft + 2 ),
(Future value future forecast),.
On the time scale we are standing at a point, and we look backward over past observations and forward
into the future. Once the forecasting model has been selected, we fit the model to the known data and
obtain the fitted values. For the known observations this allows calculation of fitted errors a measure
of goodness of fit of the model and as the new observations become available we can examine
forecasting errors.
Figure 5.2: Pegels Classification
No Seasonal Effect
1
Additive Seasonal
2
No Trend Effect
A
Additive Trend
B
Multiplicative
Trend
C
5.2
Averaging Methods
UECM2263 Applied Statistical Model
Chapter 5 - 3
Multiplicative Seasonal
3
If a time series is generated by a constant process subject to random error, then the mean is a useful
statistic and can be used as a forecast for the next periods.
5.2.1
The mean
The method of simple averages is simply to take the average of all observed data as the forecast.
1 t
Ft +1 = Yi
t i =1
When a new observation, Yt +1 becomes available, the forecast for time t + 2 is the new mean including
the previously observed data plus this new observation:
Ft + 2 =
tF + Yt +1
1 t +1
Yi = t +1
t + 1 i =1
t +1
(recursive form)
In this method, only two items (most recent forecast, Ft +1 , and the most recent observations, Yt +1 ) need
to be stored as time moves on. When forecasting a large number of series simultaneously, this saving
becomes important.
This method is appropriate when: (refer cell A-1 in Pegels table)
(i)
has no noticeable trend;
(ii)
has no noticeable seasonality.
As the calculation of the mean is based on a larger and larger past history data set, it becomes more
stable (from elementary statistical theory), assuming the underlying process is stationary.
5.2.2
Moving Averages
Ft +1 =
1 t
Yi
k i =t k +1
The term moving average is used because as one new observation becomes available, a new average
can be computed by dropping the oldest observation and including the newest one.
This moving average will then be the forecast for the next period.
Compare with the simple mean, the moving average of order k has the following characteristics:
1.
It deals only with the latest k period of known data (includes the most recent observation).
2.
The number of data points in each average does not change as time goes on.
But it also has the following disadvantages:
1.
It requires more storage because all of the k latest observations must be store, not just the
average.
2.
It cannot handle trend and seasonality very well, although it can do better than the total mean.
Month
Time, t
Observed Values
Chapter 5 - 4
MA(3)
MA(5)
200
Feb
135
Mar
195
Apr
197.5
May
310
Jun
175
234.17
Jul
155
227.50
Aug
130
213.33
206.5
Sep
220
153.33
193.5
Oct
10
277.5
168.33
198
Nov
11
235
209.17
191.5
Dec
12
244.17
203.5
(200+135+195/3)
=176.67
(135+195+197.5/3)=
175.83
(200+135+195+197.5+310/5)
= 207.5
(135+195+197.5+310+175)/5)
=202.5
The MA(3) values in column 4 are based on the values for the previous three months. For example, the
forecast for April (the fourth month) is taken to be the average of Jan, Feb, and Mar. The Decs MA(3)
forecast of 244.17 is the average for Sept, Oct, and Nov.
350
300
250
Observed values
200
MA(3) forecast
150
MA(5) forecast
100
50
0
Jan Feb Mar Apr May Jun Jul
Note:
1.
The use of a small value for k will allow the moving average to follow the pattern, but these
MA forecast will nevertheless trail the pattern, lagging behind by one or more periods.
2.
The more observations included in the moving average, the greater the smoothing effect.
5.3
Simple Exponential Smoothing (Weighted Moving Average)
Chapter 5 - 5
1
Ft +1 = Ft + (Yt Ft ) = + (1 ) Ft
t
where is a smoothing constant between 0 and 1, and Ft is the forecast of the time series in time
period t . (Yt Ft ) is the forecast error found when the observations Yt becomes available.
Note:
1.
It can be seen that the new forecast is simply the old forecast plus an adjustment for the error
that occurred in the last forecast.
2.
When has a value close to 1, the new forecast will include a substantial adjustment for the
error in the previous forecast. Conversely, when is close to 0, new forecast will include very
little adjustment.
Note that Ft +1 can be rewritten as
Ft +1 = Yt + (1 ) Ft
= Yt + (1 )[Yt 1 + (1 ) Ft 1 ]
= Yt + (1 )Yt 1 + (1 ) 2 Ft 1
= Yt + (1 )Yt 1 + (1 ) 2 Yt 2 + (1 ) 3 Yt 3
+ (1 ) 4 Yt 4 + L + (1 ) t 1 Y1 + (1 ) t F1
1.
The forecast ( Ft +1 ) is based on weighting the most recent observations ( Yt ) with a weight value
2.
3.
Chapter 5 - 6
Ft + h = Ft +1 ( h = 2, 3, ... )
If h = 1 , then a (1 )100% prediction interval computed in time period t for Ft +1 is computed by
Ft +1 z / 2 MS E
If h = 2 , then a (1 )100% prediction interval computed in time period t for Ft + 2 is computed by
Ft +1 z / 2 MS E (1 + 2 )
In general, for any h , a (1 )100% prediction interval computed in time period t for Ft + h is
computed by
Ft +1 z / 2 MS E (1 + ( h 1) 2 )
n
SS E
where MS E =
=
n 1
( yt Ft )
t =1
n 1
Example 5.1:
Simple exponential smoothing is applied to the series
t
yt
1
1000
2
900
3
990
4
909
5
982
(b) F1 =
Chapter 5 - 7
Month
Time
period, t
Actual
shipments yt
Jan
y1= 200
Feb
135
Mar
195
Smoothed
Ft
estimate,
Forecast Error
Squared
Forecast
Error
Apr
197.5
F1 = 200
F2 = Y1 + (1 ) F1 =
0.1(200) + 0.9(200) =
200
F3 = Y2 + (1 ) F2 =
0.1(135) + 0.9 (200)
= 193.5
F4 = Y3 + (1 ) F3 =
0.1(195) + 0.9(193.5)
= 193.65
May
310
194.04
115.97
13447.88
Jun
175
205.63
-30.63
938.29
Jul
155
202.57
-47.57
2262.75
Aug
130
197.81
-67.81
4598.40
Sep
220
191.03
28.97
839.24
Oct
10
277.5
193.93
83.57
6984.39
Nov
11
235
32.72
1070.30
Dec
12
Find
a)
b)
c)
d)
202.28
F12 = 0.1(235) +
0.9(202.28)
= 205.55
Y2 F2 = 65
652 = 4225
Y3 F3 = 1.5
1.52 = 2.25
197.5
193.65 = 3.85
3.852 =
14.82
Solution:
(a) F12 = Y11 + (1 ) F11
F12 = 0.1 (235) + 0.9(202.28)
= 205.55
(b) 95% PI in month 11 for F12
F12 z0.025 MSE
Chapter 5 - 8
34383.32
SSE
n 1
34383.32
=
10
= 3438.332
MSE =
R-Codes:
# To enter the data
s <- c(200,135,195,197.5,310,175,155,130,220,277.5,235)
# Estimating the level of time series using simple exponential smoothing
es1 <- HoltWinters(s, alpha=.1, beta=0, gamma=0)
es1
Output:
Holt-Winters exponential smoothing without trend and without seasonal
component.
Call:
HoltWinters(x = s, alpha = 0.1, beta = 0, gamma = 0)
Smoothing parameters:
alpha: 0.1
beta : 0
gamma: 0
Coefficients:
[,1]
a 205.5561
es1$"fitted"
es1$"SSE"
[1] 34383.32
Chapter 5 - 9
F t +1= Yt + (1 )Yt 1 + (1 ) 2 Yt 2 + L + (1 ) t 1 Y1 + (1 ) t F1
The coefficient measuring the contribution of the observations Yt , Yt 1 , , Y1 are , (1 ) ,
2.
3.
4.
5.
5.4
, (1 ) t 1 , respectively, and they are decreasing exponentially with age. For this reason we
refer this procedure as simple exponential smoothing.
One point of concern relates to the initializing phase of exponential smoothing. For example, to
get the forecasting system started we need F1 because F2 = y1 + (1 ) F1 .
Since the value of F1 is not known, we can use the first observed value ( y1 ) as the first forecast
( F1 = y1 ) and then proceed using the smoothing equation. This is one method of initialization
and is used in Example 5.1.
Another possibility would be to average the first four or five values in the data set and use this
as the initial forecast.
The weight of can be chosen by minimizing the value of MS E (through trial and error) or
some other criterions.
Note that the last term is (1 ) t F1 . So the initial forecast F1 plays a role in all subsequent
forecasts. But the weight attached to F1 is (1 ) t which is usually small. When a small value
of is chosen, the initial forecast plays a more prominent role than when a larger is used.
Also, when more data are available t is larger and so the weight attached to F1 is smaller.
If the smoothing parameter is not close to zero, the influence of the initialization process
rapidly becomes of less significance as time goes by. However, if is close to zero, the
initialization process can play a significant role for many time periods ahead.
Holt extended single exponential smoothing to linear exponential smoothing to allow forecasting of
data with trends. The forecast for Holts linear exponential smoothing is found using two smoothing
constants, and (with values between 0 and 1), and three equations:
Lt = yt + (1 )( Lt 1 + bt 1 )
bt = ( Lt Lt 1 ) + (1 )bt 1
Ft + m = Lt + bt m
This is suitable when the time series Yt = { y1 , y 2 , ..., y n } exhibits a linear trend for which the level and
growth rate/slope (trend) may be changing with no seasonal pattern.
Lt denotes an estimate of the level of the time series at time t and bt denotes an estimate of the slope
(growth rate) of the time series in time t , where Lt 1 and bt 1 are estimate at time t 1 for the level
and growth rate, respectively.
Chapter 5 - 10
2.
the last smoothed value, Lt 1 . This helps to eliminate the lag and brings Lt to the approximate
level of the current data value.
The second equation updates the trend, which is expressed as the difference between the last
two smoothed values. This is appropriate because if there is a trend in the data, new values
should be higher or lower than the previous ones. Since there may be some randomness
remaining, the trend is modified by smoothing with the trend in the last period ( Lt Lt 1 )
and adding that to the previous estimate of the trend multiplied by (1 ) .
Ft + m = Lt + bt m , ( m = 1, 2, 3, ... )
If m = 1 , then (1 )100% prediction interval computed in time period t for Ft +1 is computed by
( Lt + bt ) z / 2 MS E
If m = 2 , then (1 )100% prediction interval computed in time period t for Ft + 2 is computed by
( Lt + 2bt ) z / 2 MS E (1 + 2 (1 + ) 2 )
In general for any m > 2 , a (1 )100% prediction interval computed in time period t for Ft + m is
m 1
( Lt + bt m) z / 2 MS E (1 + 2 (1 + j ) 2 )
j =1
SS E
where MS E =
=
n 1
(y
( Lt + bt ) )
t =1
n 1
Notes:
1.
The initialization process for Holts linear exponential smoothing requires two estimates one
to get the first smoothed value for L1 and the other to get the trend b1 . One alternative is to set
L1 = y1 and b1 = y 2 y1 or b1 = ( y 4 y1 ) / 3 .
2.
Another alternative is to use least squares regression on the first few values of the series for
finding L1 and b1 .
3.
The weights and can be chosen by minimizing the value of MS E or some other criterion.
4.
Holts method sometimes called double exponential smoothing
Chapter 5 - 11
b1 = y2 y1
= 1200 1000
= 200
L2 = y2 + (1 )( L1 + b1 )
= 0.9(1200) + 0.1(1200)
= 1200
b2 = (L2 L1 ) + (1 )b1
= 0.4(1200 1000) + 0.6(200)
= 200
L5 = y5 + (1 )( L4 + b4 )
= 0.9(2000) + 0.1(1803.24)
= 1980.324
b5 = (L5 L4 ) + (1 )b4
= 0.4(367.724) + 0.6(190.64)
= 261.4736
F6 = L5 + mb5
= 1980.324 + 261.4736(1)
= 2241.8536
L3 = y3 + (1 )( L2 + b2 )
= 0.9(1500) + 0.1(1400)
= 1490
b3 = (L3 L2 ) + (1 )b2
= 0.4(290) + 0.6(200)
= 236
L4 = y4 + (1 )( L3 + b3 )
= 0.9(1600) + 0.1(1726)
= 1612.6
b4 = (L4 L3 ) + (1 )b3
= 0.4(122.6) + 0.6(236)
= 190.64
Chapter 5 - 12
4
139
5
137
6
174
7
142
8
141
9
162
t
yt
15
207
16
218
17
229
18
225
10
180
11
164
12
171
13
206
14
193
t
19
yt
204
R-codes
20
227
21
223
22
242
23
239
24
266
Y<-c(143,152,161,139,137,174,142,141,162,180,164,171,206,193,207,218,229,225,204,
227,223,242,239,266)
es1<-HoltWinters(x = Y, gamma = 0)
es1
R-output:
Call:
HoltWinters(x = Y, gamma = 0)
Smoothing parameters:
alpha: 0.5010719
beta : 0.07230122
gamma: 0
Coefficients:
[,1]
a 256.262210
b
6.207564
Find
a)
b)
c)
Solution:
F25 = L24 + mb24
(a)
= 256.2622 + (1)6.20760
= 262.4698
Chapter 5 - 13
MSE 1 + 2 (1 + )2
)
(
= 268.6774 37.7196
= (230.9578, 306.397)
5.5
Additive Holt-Winters Method
Suppose that the time series Yt = { y1 , y 2 , ..., y n } exhibits a linear trend locally and has a seasonal
pattern with constant (additive) seasonal variation and that the level, growth rate, and seasonal pattern
may be changing. Then the estimate Lt for the level, the estimate bt for the growth rate, and the
estimate S t for the seasonal factor of the time series in time period t is given by the smoothing
equations
Lt = (Yt S t s ) + (1 )( Lt 1 + bt 1 )
bt = ( Lt Lt 1 ) + (1 )bt 1
Level:
Trend:
Seasonal:
Ft + m
S t = (Yt Lt ) + (1 )S t s
= Lt + bt m + S t s + m
where , and are smoothing constants between 0 and 1, and Lt 1 and bt 1 are estimate at time
t 1 for the level and growth rate, respectively, and S t s is the estimate in time period t s for the
seasonal factor. Here s denotes the number of seasons in a year ( s = 12 for monthly data, and s =4 for
quarterly data).
Note:
Lt is a smoothed value of the series that does not include seasonality (or the data have been seasonally
adjusted), while Yt , on the other hand, do contain seasonality.
A point forecast made in time period t is
Ft + m = Lt + bt m + S t s + m
( m = 1, 2, 3, ... )
m 1
If m > s , then cm = 1 + ( (1 + j ) + d j ,s (1 ) )
j =1
Chapter 5 - 14
and MS E =
SS E
=
n
(y
( Lt 1 + bt 1 + St s +1 ) )
t =1
Initialization
To determine initial estimates of the seasonal indices we need to use at least one complete seasons
data (i.e. s period). Therefore we initialize trend and level at period s .
1
(Y1 + Y2 + L + Ys ) .
s
Y Ys
1 Y Y Y Y2
bs = s +1 1 + s + 2
+ L + s+s
s
s
s
s
S1 = Y1 Ls , S 2 = Y2 Ls , , S s = Ys Ls
Ls =
1.
2.
3.
3
13
34
48
19
4
15
37
51
21
3
0
1
0
2
0
S
a
le
s
4
0
5
0
Example 5.5:
The quarterly sales of the TRK-50 mountain bike
given below:
Quarterly Sales of the TRK-50 Mountain Bike
Year
Quarter
1
2
1
10
11
2
31
33
3
43
45
4
16
17
Time
The time plot above suggests that the mountain bike sales display a linear demand and a constant
(additive) seasonal variation. Thus we apply the additive Holt-winters method to these data in order to
find forecasts of future mountain bike sales.
Chapter 5 - 15
Time, t
Yt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
10
31
43
16
11
33
45
17
13
34
48
19
15
37
51
21
Level, Lt
L4 = 25
L5 = 25.5
L6= 26.11
26.6158
26.8281
27.3714
27.8098
28.5727
28.8350
29.3917
30.0488
30.9759
31.2215
Growth
Rate, bt
Seasonal
Factor, S t
Forecast
Ft = Lt + bt + St
b4 = 0.375
b5= 0.3875
b6= 0.4098
0.4194
0.3987
0.4131
0.4156
0.4504
0.4316
0.4441
0.4654
0.5115
0.4850
S1= -15
S2= 6
S3= 18
S4 = -9
S5= -14.95
S6= 6.089
18.0384
-9.0828
-14.8921
6.0991
18.1773
-9.1580
-14.8421
6.1843
18.3620
-9.2644
F5= 10.9375
F6= 32.6088
F7 = 45.0736
18.1440
12.8924
34.3246
47.2004
20.1085
14.9937
36.6985
49.8494
22.4421
Initialization:
10 + 31 + 43 + 16
L4 =
4
= 25
b4 =
1 y5 y1 y6 y2 y7 y3 y8 y4
+
+
+
4 4
4
4
4
1 11 10 33 31 45 43 17 16
+
+
+
4 4
4
4
4
= 0.375
S1 = y1 L4 = 10 25 = 15
=
S 2 = y2 L4 = 31 25 = 6
S3 = y3 L4 = 43 25 = 18
S 4 = y4 L4 = 16 25 = 9
Notes:
Level: Lt = (Yt S t s ) + (1 )( Lt 1 + bt 1 )
Trend: bt = ( Lt Lt 1 ) + (1 )bt 1
Seasonal: S t = (Yt Lt ) + (1 )S t s
Ft + m = Lt + bt m + S t s + m and Ft = Lt + bt + St
F5 = L5 + b5 + S5
= 25.5 + 0.3875 14.95
= 10.9375
F6 = L6 + b6 + S6
= 26.11 + 0.4098 + 6.089
= 32.6088
F7 = L7 + b7 + S7
= 26.6158 + 0.4194 + 18.0384
= 45.0736
Chapter 5 - 17
Find
a)
The point forecast for period 17, F17 made in period 16.
b)
c)
Solution:
(a) F17 = L16 + mb16 + S16-4+1
= 31.2215 + 0.485 -14.8421
= 16.8644
Chapter 5 - 18
j = m-1= 2-1
Chapter 5 - 19