Simple Exponential Smoothing & Forecasting Methods and Serial Dependence

3.
SIMPLE EXPONENTIAL SMOOTHING & FORECASTING METHODS

AND SERIAL DEPENDENCE
3.1 Simple Exponential Smoothing (SES)

This allows a compromise between extremes, providing a forecast based on a weighted
average of current and past observations.
To obtain a smoother that will react to process changes faster is to give geometrically
decreasing weights to the previous observations. Hence, an exponentially weighted
smoother is obtained by introducing a discount factor θ as
T −1
∑ θ t yT −t = yT + θ yT −1 + θ 2 yT −2 + .... + θ T −1 y1 θ <1 (3.1.1)
t =0
The smoother is not an average as the sum of the weights is
T −1
1−θ T
∑θ t = 1−θ
(3.1.2)
t =0
which does not necessarily add up to 1.

1−θ
Hence, we adjust the smoother in Eqn (3.1.1) by multiplying by
1−θT
And for large T values; θ T goes to zero and so the exponentially weighted average will
have the following form:
T −1
yˆT = (1 − θ ) ∑ θ t yT −t
t =0
= (1 − θ )  yT + θ yT −1 + θ 2 yT −2 + ..... + θ T −1 y1  (3.1.3)
This is called a simple or first-order exponential smoother
1
An alternate expression in a recursive form for the simple exponentially smoothing is
given by
S = yˆT = (1 − θ ) yT + (1 − θ ) θ yT −1 + θ yT − 2 + .... + θ y1 

2 T −1
T
= (1 − θ ) yT + θ (1 − θ )  yT −1 + θ 1 yT −2 + .... + θ T − 2 y1 
yˆT −1
= (1 − θ ) yT + θ yˆT −1 (3.1.4)
This is a linear combination of the current observation and the smoothes observations
at the previous time unit.
As the latter contains data from all previous observations, the smoothed observation
at time T is in fact the linear combination of the current observation and the
discounted sum of all previous observations.
The simple exponential smoother is often represented in a different form by

setting α = 1 − θ
yˆt +1|t = α yt + (1 − α ) yˆt|t −1 for t = 1, 2,..., T (3.1.5)

where
α represents the weight put on current observation
1 − α represents the weight put on the previous observation
0 ≤ α ≤ 1 is the smoothing parameter
Remarks
(i). To choose a value for α , this may be chosen in a subjective manner, the
forecaster specifies the value of the smoothing parameters based on previous
experience.
2
However, a more robust and objective way is to minimise the error; that is, the
errors are specified as
et = yt − yˆt|t −1 for t = 1,..., T (the one-step-ahead
within-sample forecast errors)
So
T T
( )
2
SSE = ∑ yt − yˆ t|t −1 = ∑ et2 this involves a non-linear
t =1 t =1
minimisation problem and we need to use an optimisation technique to do this.
Usually α is in the range ( 0.05,0.4 ) . A high value of α seems appropriate if

there is little previous experience or if there appears to have been some change
in pattern of the data which makes older data less relevant.
(ii). The simple exponential smoothing should only be used for non-seasonal time
series showing no systematic trend. However, we can remove the trend or
seasonal pattern to produce a stationary series, afterwards use simple
exponential smoothing.
(iii). There are more complicated versions of simple exponential smoothing that can
cope with trend and seasonality, such as Holt-Winters model.
(iv). To forecast 1 step ahead:- yˆT +1|T = yˆT +1|T That is, the last estimated value is
the forecast estimate. This implies that exponential smoothing has a ‘flat’
forecast function, and therefore for longer forecast horizons it will be last
estimated value.
(v). Error correction form, that is:

yˆt +1 = α yt + (1 − α ) yˆt
= α yt + yˆt − α yˆt
= yˆt + α ( yt − yˆt )
Forecasting error in period t

Forecast in period t
3
3.1.2 Initial value
Since ŷ0 is needed in the recursive calculations that start with:-
yˆ1 = α y1 + (1 − α ) yˆ 0
We need to estimate its value. From eqn 3.1.5 we have
yˆ1 = α y1 + (1 − α ) yˆ 0
yˆ 2|1 = α y2 + (1 − α ) yˆ1
= α y2 + (1 − α ) α y1 + (1 − α ) yˆ 0 
= α  y2 + (1 − α ) y1  + (1 − α ) yˆ 0
2
yˆ3|2 = α y2 + (1 − α ) yˆ 2|1
yˆ3 = α  y3 + (1 − α ) y2 + (1 − α ) y1  + (1 − α ) yˆ 0
2 3
 
⋮
T −1
yˆT = α  yT + (1 − α ) yT −1 + .... + (1 − α ) y1  + (1 − α ) yˆ0
T
 
T −1
yˆT +1|T = ∑ α (1 − α ) yT − j + (1 − α ) yˆ0
j T
j =0
4
As T gets large, hence (1 − α )
T
gets small, the contribution of ŷ0 to yˆT
becomes negligible
For large datasets, the estimation of ŷ0 has little relevance
Two commonly used estimates for ŷ0 are as follows:-
(a) set ŷ0 = y1 . If the changes in the process are expected to occur early and
fast, this choice for starting value for yˆT is reasonable
(b) take the average of the available data or a subset of the available date, y
and set ŷ0 = y . If the process is at least at the beginning locally constant,
this starting value may be preferred.
Example 3.1.1
The yield from carrying one paying passenger one mile for a US scheduled airlines for
an 11-year period is shown below.
t 1 2 3 4 5 6 7 8 9 10 11 12
yt 8 8.4 8.3 8.7 11 12.3 11.8 11.6 12.1 11.7 10.8
To get this figure, the total revenue was divided by the total number of miles that paying
passengers flown. This statistic is a primary determinant of airline profitability, hence,
the need to forecast these yields.
Find the forecast estimate for period 12 use the following smoothers:
(a) α = 0.05
(b) α = 0.1
(c) α = 0.3
(d) α = 0.5
5
Solution 3.1.1
So yˆt +1|t = α yt + (1 − α ) yˆt|t −1
year t yt α = 0.05 α = 0.1 α = 0.3 α = 0.5

yˆt yˆt yˆt yˆt
0 8 8 8 8
2000 1 8 8.00 8.00 8.00 8.00
2001 2 8.4 8.02 8.04 8.12 8.20
2002 3 8.3 8.03 8.07 8.17 8.25
2003 4 8.7 8.07 8.13 8.33 8.48
2004 5 11 8.21 8.42 9.13 9.74
2005 6 12.3 8.42 8.80 10.08 11.02
2006 7 11.8 8.59 9.10 10.60 11.41
2007 8 11.6 8.74 9.35 10.90 11.50
2008 9 12.1 8.91 9.63 11.26 11.80
2009 10 11.7 9.05 9.84 11.39 11.75
2010 11 10.8 9.13 9.93 11.21 11.28
Forecast h
2008 2011 1 9.13 9.93 11.21 11.28
2009 2012 2
2010 2013 3
R-Code
rev=read.table("eg311.txt",header=T)
ap=ts(rev)
library(forecast)
m1=ses(ap,alpha=0.05,initial="simple",h=1)
6
3.2 Other Simple Forecasting Techniques
Let h be the forecast horizon ( h- step ahead forecast)
(i) Mean method

(a) The forecast of all future values is equal to mean of historical data
{ y1 ,..., yn }
(b) Forecasts: yˆT + h|T = y =

( y1 + .... + yT )
T
(ii) Naïve method

(a) Forecasts equal to last observed value
(b) Forecasts: yˆT +h|T = yT

Optimal for efficient stock markets
(iii) Seasonal Naïve method

Forecasts equal to last value from same season
(iv) Drift method

(a) Forecasts equal to last value plus average change
h T
(b) Forecasts: yˆT + h|T = yT + ∑ ( yt − yt −1 )
T − 1 t =2
h
= yT + ( yT − y1 )
T −1
Equivalent to extrapolating a line drawn between first and last observations
7
Example 3.2.1
Consider Example 3.1.1. Use the following forecasting techniques to forecast period
12
(i) mean (ii) naïve (iii) seasonal naïve
(iv) drift (v) exponential, α = 0.3
Solution 3.2.1
Fig.1: Airline Profitability

12
11
profit
10
9
8
2 4 6 8 10
year
library(forecast)
meanf(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.42727 7.979823 12.87472 6.453127 14.40142
naive(rev,h=1)
12 10.8 9.596433 12.00357 8.959303 12.6407
8
snaive(rev,h=1)
12 10.8 9.596433 12.00357 8.959303 12.6407
rwf(rev,drift=T,h=1)
12 11.08 9.80992 12.35008 9.13758 13.02242
ses(rev,alpha=0.3,initial="simple",h=1)
12 11.21387 9.338143 13.0896 8.345192 14.08256
Forecasts from Mean Forecasts from Naive method

14
10 11 12
12
10
9
8
2 4 6 8 10 12 2 4 6 8 10 12
Forecasts from Seasonal naive method Forecasts from Random walk with drift
10 11 12
12
9 10
9
8
2 4 6 8 10 12 2 4 6 8 10 12
9
Forecasts from Simple exponential smoothing
1
1
1
1
1
9
84
3
2
1
0
2 4 6 8 10 12
3.3 Measures of Forecast Accuracy

Let yt denote the t th observation and ft denote its forecast, where t = 1,..., T .
Then the following measures are useful
1 T
(i) MAE = ∑ yt − ft mean absolute error
T t =1
1 T 1 T
(ii) MSE = ∑ ( yt − ft )
2
RMSE = ∑ ( yt − ft )2
T t =1 T t =1
1 T  yt − ft 
(iii) MAPE = 100 ∑   mean absolute percentage error
T t =1  yt 
10
Remarks
MAE, MSE, RMSE are all scale dependent
MAPE is scale independent but is only sensible if yt >> 0 for all i and y has
a natural zero
So if you are comparing accuracy across time series with different scales, you
can't use MSE.
For business use, MAPE is often preferred because apparently managers

understand percentages better than squared errors
MAPE cannot be used when the time series can take zero values
MASE is intended to be both independent of scale and usable on all scales
(i) Mean Absolute Scaled Error

1 T  yt − ft 
MASE = ∑  
T t =1  q 
where q is a stable measure of the scale of the time series { yt }
For non-seasonal time series

T
1
(T − 1) ∑
q= yt − yt −1
t =2
For seasonal time series

T
1
(T − s ) t =∑
q= yt − yt − s
s +1
11
Example 3.3.1
Consider example 3.1.1, how accurate are these forecast.
Solution 3.3.1
Method RMSE MAE MPE MAPE MASE

Mean 1.628212 1.510744 -2.69365 15.48064 2.158205
Naïve 0.939149 0.7 2.605165 6.388842 1
Seasonal naïve 0.939149 0.7 2.605165 6.388842 1
Drift 0.896437 0.7 -0.07819 6.334669 1
Exponential 1.46364 1.081401 8.582388 9.577755 1.544858
12
3.4 SERIAL DEPENDENCE
Recall that the y ' s are not independent but are serially dependent. We can describe
the nature of the dependence using a set of autocorrelations.
3.4.1 Autocorrelation
Given n observations ( y1,...., yn ) on a time series, we can form n − 1 pairs of
observations: ( y1, y2 ) , ( y2 , y3 ) ,....., ( yn−1, yn ) where each pair of observations is
separated by one time interval.
Regarding the first observation in each pair as one variable, and the second observation
in each pair as a second variable, then we can measure the correlation coefficient
between adjacent observations, yt and yt +1
So
∑ ( yt − y(1) ) ( yt +1 − y( 2) )
n −1
t =1
r1 = Eqn (3.4.1)
 n −1
( ) ( )
2   n −1 2
 ∑ yt − y(1)   ∑ yt +1 − y( 2 ) 
 t =1   t =1 
where
1 n−1
y(1) = ∑ yt
n − 1 t =1
the mean of the first observation in each of the n − 1 pairs
1 n−1
y( 2) = ∑ yt
n − 1 t =2
the mean of the last n − 1 observations
Equation (1.5) measures the correlation between successive observations, it is called

the sample autocorrelation coefficient or serial correlation coefficient at lag one.
13
For large n , we can use some approximations, so as y(1) ≃ y( 2 ) and dropping the factor
n ( n − 1) we get:
n −1
∑ ( yt − y )( yt +1 − y )
t =1
r1 = Eqn (3.4.2)**NB
n
∑ ( yt − y )
2
t =1
n
∑ yt
t =1
where y = is the overall mean
n
For observations ' k ' steps apart: ( y1 , yk +1 ) , ( y2 , yk + 2 ) ,....., ( yn−k , yn )
n− k
∑ ( yt − y )( yt +k − y )
t =1
rk = Eqn (3.4.3)
n
∑ ( yt − y )
2
t =1
We have the sample autocorrelation coefficient at lag k
14
3.4.2 Correlogram
The sample autocorrelation function (acf) is the set {rk : k = 0,1, 2,3......} with r0 = 1
A useful aid in interpreting a set of autocorrelation coefficients is a graph called a

correlogram, in which the sample autocorrelation coefficients, rk are plotted against
the lag k for k = 0,1,...., m , where m is usually much less than n .
For example, if n = 200 , then we might look at the first 20 or 30 coefficients.

The correlogram may also be called the sample acf.
Figure 1: observed series
Time S eries Plot of yt
90
80
70
60
50
yt
40
30
20
10
1 14 28 42 56 70 84 98 112 126 140

Inde x
15
Figure 2: sample acf
Autocorrelation Function for yt

(w ith 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Lag
3.4.3 Interpreting Correlogram

It is not easy to interpret a set of autocorrelation coefficients.
(i) Random series

A time series is said to be random if it consists of a series of independent
observations having the same distribution. Then for large n , we expect to find
that rk ≃ 0 for k = 1, 2,3,....
 1
Later we will see that for a random series, rk ~ N  0, 
 n
Thus, inspection of the correlogram can be used to ‘test’ for randomness and
also to help identify suitable models.
If a time series is random, we can expect 95% of the values of rk to lie between
± 2
n
16
(ii) Short-term correlation
Stationary series often exhibits short-term correlation characterised by a fairly
large value of r1 followed by one or two further coefficients, which while greater
than zero, tend to get successively smaller. Values of rk for longer lags tend to
be approximately zero.
(iii) Alternating series

If a time series has a tendency to alternate, with successive observations on
different sides of the overall mean, then the correlogram also tends to alternate.
17
(iv) Non-stationary series
if a time series contains a trend, then the values of rk will not come down to
zero except for very large values of the lag. Because an observation on one side
of the overall mean tends to be followed by a large number of further
observations on the same side of the mean because of the trend.
18
(v) Seasonal series
If a time series contains seasonal variation, then the correlogram will also exhibit
oscillation at the same frequency. For example, with monthly data, we can
expect r6 to be ‘large’ and negative, while r12 will be ‘large’ and positive. If the
seasonal variation is removed from seasonal data, then the correlogram may
provide useful information.
(vi) Outliers
If a time series contains one or more outliers, the correlogram may be seriously
affected and it may be advisable to adjust outliers in some way before starting
the formal analysis. For example, if there is one outlier in the time series at say,
time t0 , and it is not adjusted, then the plot of yt vs yt + k will contain two
( ) ( )
‘extreme’ points, namely, yt0 −k , yt0 and yt0 , yt0 + k . These points will depress
the sample correlation coefficients toward zero.
19

Simple Exponential Smoothing & Forecasting Methods and Serial Dependence

Uploaded by

Copyright:

Available Formats

Simple Exponential Smoothing & Forecasting Methods and Serial Dependence

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Exponential Smoothing & Forecasting Methods and Serial Dependence

Uploaded by

Copyright:

Available Formats

3.

SIMPLE EXPONENTIAL SMOOTHING & FORECASTING METHODS

3.1 Simple Exponential Smoothing (SES)

The smoother is not an average as the sum of the weights is

which does not necessarily add up to 1.

This is called a simple or first-order exponential smoother

S = yˆT = (1 − θ ) yT + (1 − θ ) θ yT −1 + θ yT − 2 + .... + θ y1 

The simple exponential smoother is often represented in a different form by

yˆt +1|t = α yt + (1 − α ) yˆt|t −1 for t = 1, 2,..., T (3.1.5)

minimisation problem and we need to use an optimisation technique to do this.

Usually α is in the range ( 0.05,0.4 ) . A high value of α seems appropriate if

(v). Error correction form, that is:

Forecasting error in period t

We need to estimate its value. From eqn 3.1.5 we have

For large datasets, the estimation of ŷ0 has little relevance

Two commonly used estimates for ŷ0 are as follows:-

year t yt α = 0.05 α = 0.1 α = 0.3 α = 0.5

Let h be the forecast horizon ( h- step ahead forecast)

(i) Mean method

(b) Forecasts: yˆT + h|T = y =

(ii) Naïve method

(b) Forecasts: yˆT +h|T = yT

(iii) Seasonal Naïve method

(iv) Drift method

Equivalent to extrapolating a line drawn between first and last observations

Fig.1: Airline Profitability

Forecasts from Mean Forecasts from Naive method

3.3 Measures of Forecast Accuracy

For business use, MAPE is often preferred because apparently managers

MASE is intended to be both independent of scale and usable on all scales

(i) Mean Absolute Scaled Error

For non-seasonal time series

For seasonal time series

Method RMSE MAE MPE MAPE MASE

Equation (1.5) measures the correlation between successive observations, it is called

For observations ' k ' steps apart: ( y1 , yk +1 ) , ( y2 , yk + 2 ) ,....., ( yn−k , yn )

We have the sample autocorrelation coefficient at lag k

A useful aid in interpreting a set of autocorrelation coefficients is a graph called a

For example, if n = 200 , then we might look at the first 20 or 30 coefficients.

Figure 1: observed series

Time S eries Plot of yt

1 14 28 42 56 70 84 98 112 126 140

Autocorrelation Function for yt

3.4.3 Interpreting Correlogram

(i) Random series

(iii) Alternating series

You might also like