Simple Exponential Smoothing & Forecasting Methods and Serial Dependence
Simple Exponential Smoothing & Forecasting Methods and Serial Dependence
Simple Exponential Smoothing & Forecasting Methods and Serial Dependence
To obtain a smoother that will react to process changes faster is to give geometrically
decreasing weights to the previous observations. Hence, an exponentially weighted
smoother is obtained by introducing a discount factor θ as
T −1
∑ θ t yT −t = yT + θ yT −1 + θ 2 yT −2 + .... + θ T −1 y1 θ <1 (3.1.1)
t =0
T −1
1−θ T
∑θ t = 1−θ
(3.1.2)
t =0
T −1
yˆT = (1 − θ ) ∑ θ t yT −t
t =0
= (1 − θ ) yT + θ yT −1 + θ 2 yT −2 + ..... + θ T −1 y1 (3.1.3)
1
An alternate expression in a recursive form for the simple exponentially smoothing is
given by
= (1 − θ ) yT + θ (1 − θ ) yT −1 + θ 1 yT −2 + .... + θ T − 2 y1
yˆT −1
= (1 − θ ) yT + θ yˆT −1 (3.1.4)
This is a linear combination of the current observation and the smoothes observations
at the previous time unit.
As the latter contains data from all previous observations, the smoothed observation
at time T is in fact the linear combination of the current observation and the
discounted sum of all previous observations.
Remarks
(i). To choose a value for α , this may be chosen in a subjective manner, the
forecaster specifies the value of the smoothing parameters based on previous
experience.
2
However, a more robust and objective way is to minimise the error; that is, the
errors are specified as
et = yt − yˆt|t −1 for t = 1,..., T (the one-step-ahead
within-sample forecast errors)
So
T T
( )
2
SSE = ∑ yt − yˆ t|t −1 = ∑ et2 this involves a non-linear
t =1 t =1
(ii). The simple exponential smoothing should only be used for non-seasonal time
series showing no systematic trend. However, we can remove the trend or
seasonal pattern to produce a stationary series, afterwards use simple
exponential smoothing.
(iii). There are more complicated versions of simple exponential smoothing that can
cope with trend and seasonality, such as Holt-Winters model.
(iv). To forecast 1 step ahead:- yˆT +1|T = yˆT +1|T That is, the last estimated value is
the forecast estimate. This implies that exponential smoothing has a ‘flat’
forecast function, and therefore for longer forecast horizons it will be last
estimated value.
3
3.1.2 Initial value
Since ŷ0 is needed in the recursive calculations that start with:-
yˆ1 = α y1 + (1 − α ) yˆ 0
yˆ1 = α y1 + (1 − α ) yˆ 0
yˆ 2|1 = α y2 + (1 − α ) yˆ1
= α y2 + (1 − α ) α y1 + (1 − α ) yˆ 0
= α y2 + (1 − α ) y1 + (1 − α ) yˆ 0
2
yˆ3|2 = α y2 + (1 − α ) yˆ 2|1
yˆ3 = α y3 + (1 − α ) y2 + (1 − α ) y1 + (1 − α ) yˆ 0
2 3
⋮
T −1
yˆT = α yT + (1 − α ) yT −1 + .... + (1 − α ) y1 + (1 − α ) yˆ0
T
T −1
yˆT +1|T = ∑ α (1 − α ) yT − j + (1 − α ) yˆ0
j T
j =0
4
As T gets large, hence (1 − α )
T
gets small, the contribution of ŷ0 to yˆT
becomes negligible
(a) set ŷ0 = y1 . If the changes in the process are expected to occur early and
fast, this choice for starting value for yˆT is reasonable
(b) take the average of the available data or a subset of the available date, y
and set ŷ0 = y . If the process is at least at the beginning locally constant,
this starting value may be preferred.
Example 3.1.1
The yield from carrying one paying passenger one mile for a US scheduled airlines for
an 11-year period is shown below.
t 1 2 3 4 5 6 7 8 9 10 11 12
yt 8 8.4 8.3 8.7 11 12.3 11.8 11.6 12.1 11.7 10.8
To get this figure, the total revenue was divided by the total number of miles that paying
passengers flown. This statistic is a primary determinant of airline profitability, hence,
the need to forecast these yields.
Find the forecast estimate for period 12 use the following smoothers:
(a) α = 0.05
(b) α = 0.1
(c) α = 0.3
(d) α = 0.5
5
Solution 3.1.1
So yˆt +1|t = α yt + (1 − α ) yˆt|t −1
Forecast h
2008 2011 1 9.13 9.93 11.21 11.28
2009 2012 2
2010 2013 3
R-Code
rev=read.table("eg311.txt",header=T)
ap=ts(rev)
library(forecast)
m1=ses(ap,alpha=0.05,initial="simple",h=1)
m2=ses(ap,alpha=0.1,initial="simple",h=1)
m3=ses(ap,alpha=0.3,initial="simple",h=1)
m4=ses(ap,alpha=0.5,initial="simple",h=1)
6
3.2 Other Simple Forecasting Techniques
h T
(b) Forecasts: yˆT + h|T = yT + ∑ ( yt − yt −1 )
T − 1 t =2
h
= yT + ( yT − y1 )
T −1
7
Example 3.2.1
Consider Example 3.1.1. Use the following forecasting techniques to forecast period
12
(i) mean (ii) naïve (iii) seasonal naïve
(iv) drift (v) exponential, α = 0.3
Solution 3.2.1
10
9
8
2 4 6 8 10
year
library(forecast)
meanf(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.42727 7.979823 12.87472 6.453127 14.40142
naive(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.8 9.596433 12.00357 8.959303 12.6407
8
snaive(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.8 9.596433 12.00357 8.959303 12.6407
rwf(rev,drift=T,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 11.08 9.80992 12.35008 9.13758 13.02242
ses(rev,alpha=0.3,initial="simple",h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 11.21387 9.338143 13.0896 8.345192 14.08256
10 11 12
12
10
9
8
2 4 6 8 10 12 2 4 6 8 10 12
Forecasts from Seasonal naive method Forecasts from Random walk with drift
10 11 12
12
9 10
9
8
2 4 6 8 10 12 2 4 6 8 10 12
9
Forecasts from Simple exponential smoothing
1
1
1
1
1
9
84
3
2
1
0
2 4 6 8 10 12
1 T
(i) MAE = ∑ yt − ft mean absolute error
T t =1
1 T 1 T
(ii) MSE = ∑ ( yt − ft )
2
RMSE = ∑ ( yt − ft )2
T t =1 T t =1
1 T yt − ft
(iii) MAPE = 100 ∑ mean absolute percentage error
T t =1 yt
10
Remarks
MAE, MSE, RMSE are all scale dependent
MAPE is scale independent but is only sensible if yt >> 0 for all i and y has
a natural zero
So if you are comparing accuracy across time series with different scales, you
can't use MSE.
MAPE cannot be used when the time series can take zero values
11
Example 3.3.1
Consider example 3.1.1, how accurate are these forecast.
Solution 3.3.1
12
3.4 SERIAL DEPENDENCE
Recall that the y ' s are not independent but are serially dependent. We can describe
the nature of the dependence using a set of autocorrelations.
3.4.1 Autocorrelation
Given n observations ( y1,...., yn ) on a time series, we can form n − 1 pairs of
observations: ( y1, y2 ) , ( y2 , y3 ) ,....., ( yn−1, yn ) where each pair of observations is
separated by one time interval.
Regarding the first observation in each pair as one variable, and the second observation
in each pair as a second variable, then we can measure the correlation coefficient
between adjacent observations, yt and yt +1
So
∑ ( yt − y(1) ) ( yt +1 − y( 2) )
n −1
t =1
r1 = Eqn (3.4.1)
n −1
( ) ( )
2 n −1 2
∑ yt − y(1) ∑ yt +1 − y( 2 )
t =1 t =1
where
1 n−1
y(1) = ∑ yt
n − 1 t =1
the mean of the first observation in each of the n − 1 pairs
1 n−1
y( 2) = ∑ yt
n − 1 t =2
the mean of the last n − 1 observations
13
For large n , we can use some approximations, so as y(1) ≃ y( 2 ) and dropping the factor
n ( n − 1) we get:
n −1
∑ ( yt − y )( yt +1 − y )
t =1
r1 = Eqn (3.4.2)**NB
n
∑ ( yt − y )
2
t =1
n
∑ yt
t =1
where y = is the overall mean
n
n− k
∑ ( yt − y )( yt +k − y )
t =1
rk = Eqn (3.4.3)
n
∑ ( yt − y )
2
t =1
14
3.4.2 Correlogram
The sample autocorrelation function (acf) is the set {rk : k = 0,1, 2,3......} with r0 = 1
90
80
70
60
50
yt
40
30
20
10
15
Figure 2: sample acf
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Lag
16
(ii) Short-term correlation
Stationary series often exhibits short-term correlation characterised by a fairly
large value of r1 followed by one or two further coefficients, which while greater
than zero, tend to get successively smaller. Values of rk for longer lags tend to
be approximately zero.
17
(iv) Non-stationary series
if a time series contains a trend, then the values of rk will not come down to
zero except for very large values of the lag. Because an observation on one side
of the overall mean tends to be followed by a large number of further
observations on the same side of the mean because of the trend.
18
(v) Seasonal series
If a time series contains seasonal variation, then the correlogram will also exhibit
oscillation at the same frequency. For example, with monthly data, we can
expect r6 to be ‘large’ and negative, while r12 will be ‘large’ and positive. If the
seasonal variation is removed from seasonal data, then the correlogram may
provide useful information.
(vi) Outliers
If a time series contains one or more outliers, the correlogram may be seriously
affected and it may be advisable to adjust outliers in some way before starting
the formal analysis. For example, if there is one outlier in the time series at say,
time t0 , and it is not adjusted, then the plot of yt vs yt + k will contain two
( ) ( )
‘extreme’ points, namely, yt0 −k , yt0 and yt0 , yt0 + k . These points will depress
the sample correlation coefficients toward zero.
19