Time Series (ARIMA) : by Hrishikesh Khaladkar Department of Mathematics Fergusson College, Pune

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Time Series (ARIMA)

by
Hrishikesh Khaladkar
Department of Mathematics
Fergusson College,Pune

April 11, 2018

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 1 / 34


Time Series

What is a Time Series

Any process that varies over a time is a Time Series process provided
that the interval is fixed.
Time series data is a sequence of records collected from a process
with equally spaced intervals in time.
The aim of time series analysis is to comprehend historical time line
of data, analyze it to uncover hidden patterns and finally model the
patterns to use it for forecasting.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 2 / 34


Time Series

Real Life Example!!!!

Suppose Mr. Ajay starts his job in year 2010 and his starting salary was
5, 000 Rs per month. Every years he is appraised and salary reached to a
level of 20, 000 Rs per month in year 2014. His annual salary can be
considered a time series and it is clear that every year’s salary is function
of previous year’s salary (here function is appraisal rating).

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 3 / 34


Time Series

Main Phases of a time Series

In descriptive phase you try understand the nature of time series. We


try to identify the trend, seasonal, cyclic or any irregular variations in
the data.
In modeling or pattern discovery step, we model the inherent patterns
of the time series data. There are several methods of finding patterns
in the process of time series analysis.
Once identified,forecasting is relatively easy.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 4 / 34


Time Series

Examples:Laptop sales (Time Series Plot)

Time series data of a monthly Laptop sales (in thousands)

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 5 / 34


Time Series

Examples:US GDP (Source: World Bank) Time Series Plot

Time series data of a GDP for United States (in Billions)

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 6 / 34


Time Series

Regression vs Time Series

Normally in predictive modelling , you predict the depending variable


Y on a set of X variables.
In time series you have to predict Y using the previous values of Y.
Regression Y = f (X1 , X2 ....XP )
Time Series Yt = g (Yt−1 , Yt−2 ....Y1 )

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 7 / 34


Time Series

Components of a Time Series

Trend: Series could be constantly increasing or decreasing or first


decreasing for a considerable time period and then decreasing. This
trend is identified and then removed from the time series in ARIMA
forecasting process.
Seasonality: Repeating pattern with fixed period.
For examples : Sales in festive seasons. Sales of Candies and sales of
Chocolates peaks in every October Month and December month
respectively every year in US. It is because of Halloween and
Christmas falling in those months. The time-series should be
de-seasonalized in ARIMA forecasting process.
Random Variation (Irregular Component): This is the unexplained
variation in the time-series which is totally random. Erratic
movements that are not predictable because they do not follow a
pattern.
Example: Earthquake, Famine, Big Economical Scandal etc..(This is
what makes Time Series Analysis tough)

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 8 / 34


Time Series

Measure the trend


There are several ways to measure the trend.
Graphical Method
Method of Semi Averages
Method of Curve Fitting using the Principles of Least Squares
Method of Moving Averages
We discuss the Method of Moving Averages
It consists of obtaining a series of moving averages (arithmetic means)
of the successive overlapping groups or sections of a time series.
The averaging process smoothens the ups and downs in the data.
The moving average is characterized by a constant known as the
period or extent of the moving average.
For example if y1 , y2 , y3 ...... is a time series then
y1 + y2 ...ym y2 + y3 ...ym+1 y3 + y4 ...ym+2
1st MA= , 2nd MA= , 3rd MA=
m m m
and so on.
Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 9 / 34
Time Series

How to identify seasonality


There are several graphical ways to plot seasonality
Run charts
Multiple Box Plots
Seasonal plots

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 10 / 34


Time Series

Stationary and Non-Stationary Time Series


A time series is said to be stationary if there is no systematic change
in mean (no trend), if there is no systematic change in variance, and
if strictly periodic variations have been removed.
In the stationary time-series process, the mean and variance hover
around a single value.
With the growth of a series, if the mean and variance of the series
also tend to grow extremely high, then the series is considered to be
nonstationary. A nonconstant mean or nonconstant variance is a sign
of a nonstationary time-series process.These processes are also called
explosive.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 11 / 34


Time Series

What is a Stationary Time Series


Some steadiness is apparent in the series plot. Its not inflating too
much from its mean value line. In this case, the value of variance or
mean will also show steadiness or a state of equilibrium.
The graph below denotes a stationary time series

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 12 / 34


Time Series

Testing Stationarity Using a DF Test

To test the stationarity of a series, you use Dickey Fuller (DF) test checks.
The null and alternative hypotheses of a DF test are as follows:
H0 : The series is not stationary.
H1 : The series is stationary.
You perform a DF test and take note of the P-value.
Considering the p value
if the P-value is less than 5 percent (0.05), youreject the null
hypothesis, which is equivalent to rejecting the hypothesis that the
series is not stationary. So, a series is concluded as stationary when
the P-value of a DF test is less than 0.05.
On the other hand, if the P-value is more than 0.05, then you go
ahead and accept the null hypothesis, which means that the series is
not stationary.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 13 / 34


Time Series

Achieving Stationarity
If a series is not stationary, you can differentiate it to make it
stationary.
If Yt is the original series, then ∆Yt = Yt − Yt−1 and work with this
new series of ∆Yi . This is called as lag.

Note that some series may not be stationary even after the first
differentiation. You then need to go for the second differentiation. If
even the second differentiation doesnt work, you may have to try
some other transformation logarithm.
Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 14 / 34
Time Series

White noise

A white noise process is one with a constant mean of zero, a constant


variance and no correlation between its values at different times.
White noise series exhibit a very erratic, jumpy, unpredictable
behavior.Since values are uncorrelated, previous values do not help us
to forecast future values.
White noise series themselves are quite uninteresting from a
forecasting standpoint (they are no linearly forecastable).

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 15 / 34


Time Series

ARIMA (Box Jenkins Approach)

ARIMA : Auto Regressive Integrated Moving Average


Using Box Jenkins approach it is developed in two steps by
understanding the AR, MA and ARMA Approach.
Understanding ARIMA is same as understanding Eye Sight
Measurement using Snellen chart.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 16 / 34


Time Series

AR Process
Consider the time series given by Yt , Yt−1 , ....Yt−k .
The auto-regressive process is denoted by AR(p), where p is the order
of the auto-regressive process.
In the AR process, the current values of the series are a factor of
previous values.
p determines on how many previous values the current value of the
series depends.
In an AR(1) process then Yt ) = a1 Yt−1 + t where
a1 : Quantified impact of Yt−1 on Yt .
t : white noise.
Similarly in an AR(2) process then Yt = a1 Yt−1 + a2 Yt−2 t where
a1 : Quantified impact of Yt−1 on Yt .
a2 : Quantified impact of Yt−2 on Yt .
t : white noise.
Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 17 / 34
Time Series

AR Process

In general for an AR(p) process Yt = a1 Yt−1 + a2 Yt−2 + ....ap Yt−p + t


where
a1 , a2 , ...ap : Quantified impact of Yt−1 , Yt−2 , ....Yt−p on Yt .
t : white noise.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 18 / 34


Time Series

MA Process

A moving average (MA) process is a time-series process where the current


value and the previous values.in the series are almost the same. But the
current deviation in the series depends upon the previous white noise or
error or shock.
It is similar to a AR(p) process, it tells us how many error values have an
effect on the current value.
In an MA(1) process the deviation at time t will be a factor of the error at
t and t-1.
If the series is a MA(1) process then Yt − µ = t + b1 t−1 where
b1 : Quantified impact of t−1 on t .
µ : is the mean of the overall series.
Yt − µ is the deviation is at time t.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 19 / 34


Time Series

MA Process

In a process is MA(2),then Yt − µ = t + b1 t−1 + b2 t−2 where


b1 , b2 : Quantified impact of t−1 , t−2 on t .
µ : is the mean of the overall series.
Yt − µ is the deviation is at time t.
In general if the process is MA(q),then
Yt − µ = t + b1 t−1 + b2 t−2 + ...bt−q t−q where
b1 , b2 , ..bq : Quantified impact of t−1 , t−2 , ...t−q on t .
µ : is the mean of the overall series.
Yt − µ is the deviation is at time t.
If the mean is zero then
Yt = t + b1 t−1 + b2 t−2 + ...bt−q t−q

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 20 / 34


Time Series

ARMA Process

If a process shows the properties of an auto-regressive process and a


moving average process, then it is called an ARMA process. In an ARMA
time-series process, the current value of the series depends on its previous
values.
You can think of an ARMA process as a series with both long-term
trend and short-term seasonality.
ARMA(p,q) is the general notation for an ARMA process, where p is the
order of the AR process and q is the order of the MA process. In an
AR(1,1) series Yt = a1 Yt−1 + t + b1 t−1
a1 : Quantified impact of Yt−1 on Yt .
t : is the random error at time t.
b1 Quantified impact of t−1 on t .

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 21 / 34


Time Series

ARMA Process

In an AR(2,1) series Yt = a1 Yt−1 + a2 Yt−2 t + b1 t−1


a1 , a2 : Quantified impact of Yt−1 , Yt−2 on Yt .
t : is the random error at time t.
b1 Quantified impact of t−1 on t .
In general for a AR(p,q) series
Yt = a1 Yt−1 + a2 Yt−2 + ...ap Yt−p + t + b1 t−1 + b2 t−2 + .....bq t−q
a1 , a2 , ...ap : Quantified impact of Yt−1 , Yt−2 , ...Yt−p on Yt .
t : is the random error at time t.
b1 , ....bq : Quantified impact of t−1 , t−2 ..t−q on t .

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 22 / 34


Time Series

Comparision with Vision Test

To test eye sight and prescribe eyeglasses, doctors perform a small


test. Instead of using special equipment, in the past doctors had a
box full of lenses (of different powers).
The patient was asked to sit in a chair and was given an empty frame
to put on her eyes. The doctor used to put differently powered
lenses,on by one, in the frame and asked the patient to read from the
Snellen chart. Some patients, for example, read the top seven rows
and struggled with the lower ones. The doctor then removed the first
lens and put another.
After much such iteration, the doctor used to finalize on the exact
lenses to be used in the patients glass. Some patients got diagnosed
as nearsighted and some with farsightedness.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 23 / 34


Time Series

Analogy of Vision Test and Box Jenkins Approach

Vision Test
Assume the patient is literate.
Based on some tests, identify nearsightedness or farsightedness and
get a rough estimate of eyesight.
Estimate the exact eyesight by trying various lenses.
Use the test results to give the prescription.
Box Jenkins Approach
Assume that the time series is stationary,if not make it stationary.
Based on ACF and PACF plots , identify whether the model is an AR
or MA or ARMA process.
Estimate the parameters such as a1 , a2 ...ap and b1 , b2 ...bq .
Use the final model for forecasting.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 24 / 34


Time Series

ARIMA Processess

The ARIMA(1,1,0) series is written as follows:


∆Yt = a1 ∆Yt−1 + t .
The ARIMA(2,1,0) series is written as follows:
∆Yt = a1 ∆Yt−1 + a2 ∆Yt−2 + t .
The ARIMA(1,1,1) series is written as follows:
∆Yt = a1 ∆Yt−1 + t + b1 t−1 .
The rule of thumb is that you subtract the previous lag values while
differentiating; you add previous lag values while integrating.
Next how to identify p and q for ARIMA Models

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 25 / 34


Time Series

Auto Correlation Function

Auto correlation is the correlation between Y t upon Y t1 . Generally


youwill find correlation between two variables, but here you are finding
correlation between Y upon previous values of Y. The auto correlation
function is a function of all such correlations at different lags.The ACF is
denoted by ρh , where h indicates the lag.
ACF(0): Correlation at lag0 ρ0 =Yt and Yt = 1.
ACF(1): Correlation at lag1 ρ1 =Yt and Yt−1 .
ACF(2): Correlation at lag2 ρ2 =Yt and Yt−2 .
ACF(3): Correlation at lag3 ρ3 =Yt and Yt−3 .
The graphs created using auto correlation values are called auto
correlation plots

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 26 / 34


Time Series

Auto Correlation plots

On the x-axis you have the lag values, while the y-axis has the
autocorrelation values. The graph might vary based on the type of the
series.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 27 / 34


Time Series

Partial Auto Correlation Function

The partial auto correlation function is the partial correlations between Y


and its previous values calculated at different lags. n the context of
time-series analysis, partial auto correlation is found by regressing the old
values of Y on the current value. The PACF is denoted by θh , where h
indicates the lag.
PACF(0): Partial Correlation at lag0 θ0 =Yt and Yt = 1.
PACF(1):Partial Correlation at lag1 θ1 =Regression coefficient of Yt−1
when Yt−1 is regressed upon Yt .
PACF(2): Partial Correlation at lag2 θ2 =Regression coefficient of
Yt−2 when Yt−1 and Yt−2 is regressed upon Yt .
PACF(3): Partial Correlation at lag3 θ3 =Regression coefficient of
Yt−3 when Yt−1 , Yt−2 and Yt−3 is regressed upon Yt .
The graphs created using partial auto correlation values are called partial
auto correlation plots

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 28 / 34


Time Series

Partial Auto Correlation plots

On the x-axis you have the lag values, while the y- axis has the partial
auto correlation values.As is the case with ACF, a PACF graph might vary
based on the type of series.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 29 / 34


Time Series

Rules of thumb to identify a AR(p) process


The rule is as follows:
ACF: Slowly tails off or diminishes to zero. Either reduces in one
direction or reduces in a sinusoidal (sine wave) passion.
PACF: Cuts off. The cutoff lag indicates the order of the AR process.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 30 / 34


Time Series

Rules of thumb to identify a MA(q) process


The rule is as follows:
ACF: Cuts off. The cutoff lag indicates the order of the MA process.
PACF: Slowly tails off or diminishes to zero. Either reduces in one
direction or reduces in a sinusoidal or sine wave pattern.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 31 / 34


Time Series

Rules of Thumb for Identifying the ARMA Process


The rule is as follows:
ACF: Dampens to zero.
PACF: Dampens to zero.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 32 / 34


Time Series

Checking for Model Accuracy

Ideally you would like to have the error be zero or less than 5 percent.The
following are some measures of accuracy.
Yi denote the actual value.
Ŷi denote the expected value
Pn
1 Mean absolute deviation (MAD)= i=1 |Yi − Ŷi |
n
100 ni=1 |Yi − Ŷi |
P
2 Mean absolute percent error(MAPE)=
n Yi
Pn 2
3 Mean square error (MSE)= i=1 (Yi − Ŷi )
n
4 Another related measure is root mean square error, which is

RMSE = MSE .
It is generally a good practice to keep 5 to 10 percent of the sample data
for validation purposes

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 33 / 34


Time Series

Suggestions with the Box Jenkins Approach

This is a logical end to our time-series analysis and forecasting process


using the BoxJenkins ARIMA approach. Every analyst wants to accurately
forecast the future values by building the best model for the available
historical data. This can be definitely achieved by using the BoxJenkins
approach. The following are some suggestions while building time-series
models:
Have sufficient historical data, at least 30 data points. Make sure you
dont run into too much history. Only the historical values that will
have an impact on future forecasts should be considered.
Do not forecast too far into the future. With one year of data,
forecasting the next two years is not a good idea. It may be that 10
percent or fewer data points into the future is recommended.
Remove outliers before building the model.

Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 34 / 34

You might also like