Simple Exponential Smoothing & Forecasting Methods and Serial Dependence

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

3.

SIMPLE EXPONENTIAL SMOOTHING & FORECASTING METHODS


AND SERIAL DEPENDENCE

3.1 Simple Exponential Smoothing (SES)


This allows a compromise between extremes, providing a forecast based on a weighted
average of current and past observations.

To obtain a smoother that will react to process changes faster is to give geometrically
decreasing weights to the previous observations. Hence, an exponentially weighted
smoother is obtained by introducing a discount factor θ as

T −1
∑ θ t yT −t = yT + θ yT −1 + θ 2 yT −2 + .... + θ T −1 y1 θ <1 (3.1.1)
t =0

The smoother is not an average as the sum of the weights is

T −1
1−θ T
∑θ t = 1−θ
(3.1.2)
t =0

which does not necessarily add up to 1.


1−θ
Hence, we adjust the smoother in Eqn (3.1.1) by multiplying by
1−θT
And for large T values; θ T goes to zero and so the exponentially weighted average will
have the following form:

T −1
yˆT = (1 − θ ) ∑ θ t yT −t
t =0

= (1 − θ )  yT + θ yT −1 + θ 2 yT −2 + ..... + θ T −1 y1  (3.1.3)

This is called a simple or first-order exponential smoother

1
An alternate expression in a recursive form for the simple exponentially smoothing is
given by

S = yˆT = (1 − θ ) yT + (1 − θ ) θ yT −1 + θ yT − 2 + .... + θ y1 


2 T −1
T

= (1 − θ ) yT + θ (1 − θ )  yT −1 + θ 1 yT −2 + .... + θ T − 2 y1 
yˆT −1

= (1 − θ ) yT + θ yˆT −1 (3.1.4)

This is a linear combination of the current observation and the smoothes observations
at the previous time unit.

As the latter contains data from all previous observations, the smoothed observation
at time T is in fact the linear combination of the current observation and the
discounted sum of all previous observations.

The simple exponential smoother is often represented in a different form by


setting α = 1 − θ

yˆt +1|t = α yt + (1 − α ) yˆt|t −1 for t = 1, 2,..., T (3.1.5)


where
α represents the weight put on current observation
1 − α represents the weight put on the previous observation
0 ≤ α ≤ 1 is the smoothing parameter

Remarks
(i). To choose a value for α , this may be chosen in a subjective manner, the
forecaster specifies the value of the smoothing parameters based on previous
experience.

2
However, a more robust and objective way is to minimise the error; that is, the
errors are specified as
et = yt − yˆt|t −1 for t = 1,..., T (the one-step-ahead
within-sample forecast errors)
So
T T
( )
2
SSE = ∑ yt − yˆ t|t −1 = ∑ et2 this involves a non-linear
t =1 t =1

minimisation problem and we need to use an optimisation technique to do this.

Usually α is in the range ( 0.05,0.4 ) . A high value of α seems appropriate if


there is little previous experience or if there appears to have been some change
in pattern of the data which makes older data less relevant.

(ii). The simple exponential smoothing should only be used for non-seasonal time
series showing no systematic trend. However, we can remove the trend or
seasonal pattern to produce a stationary series, afterwards use simple
exponential smoothing.

(iii). There are more complicated versions of simple exponential smoothing that can
cope with trend and seasonality, such as Holt-Winters model.

(iv). To forecast 1 step ahead:- yˆT +1|T = yˆT +1|T That is, the last estimated value is
the forecast estimate. This implies that exponential smoothing has a ‘flat’
forecast function, and therefore for longer forecast horizons it will be last
estimated value.

(v). Error correction form, that is:


yˆt +1 = α yt + (1 − α ) yˆt
= α yt + yˆt − α yˆt
= yˆt + α ( yt − yˆt )

Forecasting error in period t


Forecast in period t

3
3.1.2 Initial value
Since ŷ0 is needed in the recursive calculations that start with:-

yˆ1 = α y1 + (1 − α ) yˆ 0

We need to estimate its value. From eqn 3.1.5 we have

yˆ1 = α y1 + (1 − α ) yˆ 0

yˆ 2|1 = α y2 + (1 − α ) yˆ1

= α y2 + (1 − α ) α y1 + (1 − α ) yˆ 0 

= α  y2 + (1 − α ) y1  + (1 − α ) yˆ 0
2

yˆ3|2 = α y2 + (1 − α ) yˆ 2|1

yˆ3 = α  y3 + (1 − α ) y2 + (1 − α ) y1  + (1 − α ) yˆ 0
2 3
 

T −1
yˆT = α  yT + (1 − α ) yT −1 + .... + (1 − α ) y1  + (1 − α ) yˆ0
T
 

T −1
yˆT +1|T = ∑ α (1 − α ) yT − j + (1 − α ) yˆ0
j T

j =0

4
As T gets large, hence (1 − α )
T
gets small, the contribution of ŷ0 to yˆT
becomes negligible

For large datasets, the estimation of ŷ0 has little relevance

Two commonly used estimates for ŷ0 are as follows:-

(a) set ŷ0 = y1 . If the changes in the process are expected to occur early and
fast, this choice for starting value for yˆT is reasonable

(b) take the average of the available data or a subset of the available date, y
and set ŷ0 = y . If the process is at least at the beginning locally constant,
this starting value may be preferred.

Example 3.1.1
The yield from carrying one paying passenger one mile for a US scheduled airlines for
an 11-year period is shown below.

t 1 2 3 4 5 6 7 8 9 10 11 12
yt 8 8.4 8.3 8.7 11 12.3 11.8 11.6 12.1 11.7 10.8

To get this figure, the total revenue was divided by the total number of miles that paying
passengers flown. This statistic is a primary determinant of airline profitability, hence,
the need to forecast these yields.

Find the forecast estimate for period 12 use the following smoothers:
(a) α = 0.05
(b) α = 0.1
(c) α = 0.3
(d) α = 0.5

5
Solution 3.1.1
So yˆt +1|t = α yt + (1 − α ) yˆt|t −1

year t yt α = 0.05 α = 0.1 α = 0.3 α = 0.5


yˆt yˆt yˆt yˆt
0 8 8 8 8
2000 1 8 8.00 8.00 8.00 8.00
2001 2 8.4 8.02 8.04 8.12 8.20
2002 3 8.3 8.03 8.07 8.17 8.25
2003 4 8.7 8.07 8.13 8.33 8.48
2004 5 11 8.21 8.42 9.13 9.74
2005 6 12.3 8.42 8.80 10.08 11.02
2006 7 11.8 8.59 9.10 10.60 11.41
2007 8 11.6 8.74 9.35 10.90 11.50
2008 9 12.1 8.91 9.63 11.26 11.80
2009 10 11.7 9.05 9.84 11.39 11.75
2010 11 10.8 9.13 9.93 11.21 11.28

Forecast h
2008 2011 1 9.13 9.93 11.21 11.28
2009 2012 2
2010 2013 3

R-Code
rev=read.table("eg311.txt",header=T)
ap=ts(rev)
library(forecast)
m1=ses(ap,alpha=0.05,initial="simple",h=1)
m2=ses(ap,alpha=0.1,initial="simple",h=1)
m3=ses(ap,alpha=0.3,initial="simple",h=1)
m4=ses(ap,alpha=0.5,initial="simple",h=1)

6
3.2 Other Simple Forecasting Techniques

Let h be the forecast horizon ( h- step ahead forecast)

(i) Mean method


(a) The forecast of all future values is equal to mean of historical data
{ y1 ,..., yn }

(b) Forecasts: yˆT + h|T = y =


( y1 + .... + yT )
T

(ii) Naïve method


(a) Forecasts equal to last observed value

(b) Forecasts: yˆT +h|T = yT


Optimal for efficient stock markets

(iii) Seasonal Naïve method


Forecasts equal to last value from same season

(iv) Drift method


(a) Forecasts equal to last value plus average change

h T
(b) Forecasts: yˆT + h|T = yT + ∑ ( yt − yt −1 )
T − 1 t =2

h
= yT + ( yT − y1 )
T −1

Equivalent to extrapolating a line drawn between first and last observations

7
Example 3.2.1
Consider Example 3.1.1. Use the following forecasting techniques to forecast period
12
(i) mean (ii) naïve (iii) seasonal naïve
(iv) drift (v) exponential, α = 0.3

Solution 3.2.1

Fig.1: Airline Profitability


12
11
profit

10
9
8

2 4 6 8 10

year

library(forecast)

meanf(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.42727 7.979823 12.87472 6.453127 14.40142

naive(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.8 9.596433 12.00357 8.959303 12.6407

8
snaive(rev,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 10.8 9.596433 12.00357 8.959303 12.6407

rwf(rev,drift=T,h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 11.08 9.80992 12.35008 9.13758 13.02242

ses(rev,alpha=0.3,initial="simple",h=1)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
12 11.21387 9.338143 13.0896 8.345192 14.08256

Forecasts from Mean Forecasts from Naive method


14

10 11 12
12
10

9
8

2 4 6 8 10 12 2 4 6 8 10 12

Forecasts from Seasonal naive method Forecasts from Random walk with drift
10 11 12

12
9 10
9
8

2 4 6 8 10 12 2 4 6 8 10 12

9
Forecasts from Simple exponential smoothing

1
1
1
1
1
9
84
3
2
1
0

2 4 6 8 10 12

3.3 Measures of Forecast Accuracy


Let yt denote the t th observation and ft denote its forecast, where t = 1,..., T .
Then the following measures are useful

1 T
(i) MAE = ∑ yt − ft mean absolute error
T t =1

1 T 1 T
(ii) MSE = ∑ ( yt − ft )
2
RMSE = ∑ ( yt − ft )2
T t =1 T t =1

1 T  yt − ft 
(iii) MAPE = 100 ∑   mean absolute percentage error
T t =1  yt 

10
Remarks
MAE, MSE, RMSE are all scale dependent

MAPE is scale independent but is only sensible if yt >> 0 for all i and y has
a natural zero

So if you are comparing accuracy across time series with different scales, you
can't use MSE.

For business use, MAPE is often preferred because apparently managers


understand percentages better than squared errors

MAPE cannot be used when the time series can take zero values

MASE is intended to be both independent of scale and usable on all scales

(i) Mean Absolute Scaled Error


1 T  yt − ft 
MASE = ∑  
T t =1  q 
where q is a stable measure of the scale of the time series { yt }

For non-seasonal time series


T
1
(T − 1) ∑
q= yt − yt −1
t =2

For seasonal time series


T
1
(T − s ) t =∑
q= yt − yt − s
s +1

11
Example 3.3.1
Consider example 3.1.1, how accurate are these forecast.

Solution 3.3.1

Method RMSE MAE MPE MAPE MASE


Mean 1.628212 1.510744 -2.69365 15.48064 2.158205
Naïve 0.939149 0.7 2.605165 6.388842 1
Seasonal naïve 0.939149 0.7 2.605165 6.388842 1
Drift 0.896437 0.7 -0.07819 6.334669 1
Exponential 1.46364 1.081401 8.582388 9.577755 1.544858

12
3.4 SERIAL DEPENDENCE

Recall that the y ' s are not independent but are serially dependent. We can describe
the nature of the dependence using a set of autocorrelations.

3.4.1 Autocorrelation
Given n observations ( y1,...., yn ) on a time series, we can form n − 1 pairs of
observations: ( y1, y2 ) , ( y2 , y3 ) ,....., ( yn−1, yn ) where each pair of observations is
separated by one time interval.

Regarding the first observation in each pair as one variable, and the second observation
in each pair as a second variable, then we can measure the correlation coefficient
between adjacent observations, yt and yt +1

So

∑ ( yt − y(1) ) ( yt +1 − y( 2) )
n −1

t =1
r1 = Eqn (3.4.1)
 n −1
( ) ( )
2   n −1 2
 ∑ yt − y(1)   ∑ yt +1 − y( 2 ) 
 t =1   t =1 

where
1 n−1
y(1) = ∑ yt
n − 1 t =1
the mean of the first observation in each of the n − 1 pairs

1 n−1
y( 2) = ∑ yt
n − 1 t =2
the mean of the last n − 1 observations

Equation (1.5) measures the correlation between successive observations, it is called


the sample autocorrelation coefficient or serial correlation coefficient at lag one.

13
For large n , we can use some approximations, so as y(1) ≃ y( 2 ) and dropping the factor

n ( n − 1) we get:
n −1
∑ ( yt − y )( yt +1 − y )
t =1
r1 = Eqn (3.4.2)**NB
n
∑ ( yt − y )
2

t =1

n
∑ yt
t =1
where y = is the overall mean
n

For observations ' k ' steps apart: ( y1 , yk +1 ) , ( y2 , yk + 2 ) ,....., ( yn−k , yn )

n− k
∑ ( yt − y )( yt +k − y )
t =1
rk = Eqn (3.4.3)
n
∑ ( yt − y )
2

t =1

We have the sample autocorrelation coefficient at lag k

14
3.4.2 Correlogram
The sample autocorrelation function (acf) is the set {rk : k = 0,1, 2,3......} with r0 = 1

A useful aid in interpreting a set of autocorrelation coefficients is a graph called a


correlogram, in which the sample autocorrelation coefficients, rk are plotted against
the lag k for k = 0,1,...., m , where m is usually much less than n .

For example, if n = 200 , then we might look at the first 20 or 30 coefficients.


The correlogram may also be called the sample acf.

Figure 1: observed series

Time S eries Plot of yt

90

80

70

60

50
yt

40

30

20

10

1 14 28 42 56 70 84 98 112 126 140


Inde x

15
Figure 2: sample acf

Autocorrelation Function for yt


(w ith 5% significance limits for the autocorrelations)

1.0
0.8
0.6
0.4
Autocorrelation

0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Lag

3.4.3 Interpreting Correlogram


It is not easy to interpret a set of autocorrelation coefficients.

(i) Random series


A time series is said to be random if it consists of a series of independent
observations having the same distribution. Then for large n , we expect to find
that rk ≃ 0 for k = 1, 2,3,....
 1
Later we will see that for a random series, rk ~ N  0, 
 n
Thus, inspection of the correlogram can be used to ‘test’ for randomness and
also to help identify suitable models.
If a time series is random, we can expect 95% of the values of rk to lie between
± 2
n

16
(ii) Short-term correlation
Stationary series often exhibits short-term correlation characterised by a fairly
large value of r1 followed by one or two further coefficients, which while greater
than zero, tend to get successively smaller. Values of rk for longer lags tend to
be approximately zero.

(iii) Alternating series


If a time series has a tendency to alternate, with successive observations on
different sides of the overall mean, then the correlogram also tends to alternate.

17
(iv) Non-stationary series
if a time series contains a trend, then the values of rk will not come down to
zero except for very large values of the lag. Because an observation on one side
of the overall mean tends to be followed by a large number of further
observations on the same side of the mean because of the trend.

18
(v) Seasonal series
If a time series contains seasonal variation, then the correlogram will also exhibit
oscillation at the same frequency. For example, with monthly data, we can
expect r6 to be ‘large’ and negative, while r12 will be ‘large’ and positive. If the
seasonal variation is removed from seasonal data, then the correlogram may
provide useful information.

(vi) Outliers
If a time series contains one or more outliers, the correlogram may be seriously
affected and it may be advisable to adjust outliers in some way before starting
the formal analysis. For example, if there is one outlier in the time series at say,
time t0 , and it is not adjusted, then the plot of yt vs yt + k will contain two

( ) ( )
‘extreme’ points, namely, yt0 −k , yt0 and yt0 , yt0 + k . These points will depress
the sample correlation coefficients toward zero.

19

You might also like