Time Series (ARIMA) : by Hrishikesh Khaladkar Department of Mathematics Fergusson College, Pune
Time Series (ARIMA) : by Hrishikesh Khaladkar Department of Mathematics Fergusson College, Pune
Time Series (ARIMA) : by Hrishikesh Khaladkar Department of Mathematics Fergusson College, Pune
by
Hrishikesh Khaladkar
Department of Mathematics
Fergusson College,Pune
Any process that varies over a time is a Time Series process provided
that the interval is fixed.
Time series data is a sequence of records collected from a process
with equally spaced intervals in time.
The aim of time series analysis is to comprehend historical time line
of data, analyze it to uncover hidden patterns and finally model the
patterns to use it for forecasting.
Suppose Mr. Ajay starts his job in year 2010 and his starting salary was
5, 000 Rs per month. Every years he is appraised and salary reached to a
level of 20, 000 Rs per month in year 2014. His annual salary can be
considered a time series and it is clear that every year’s salary is function
of previous year’s salary (here function is appraisal rating).
To test the stationarity of a series, you use Dickey Fuller (DF) test checks.
The null and alternative hypotheses of a DF test are as follows:
H0 : The series is not stationary.
H1 : The series is stationary.
You perform a DF test and take note of the P-value.
Considering the p value
if the P-value is less than 5 percent (0.05), youreject the null
hypothesis, which is equivalent to rejecting the hypothesis that the
series is not stationary. So, a series is concluded as stationary when
the P-value of a DF test is less than 0.05.
On the other hand, if the P-value is more than 0.05, then you go
ahead and accept the null hypothesis, which means that the series is
not stationary.
Achieving Stationarity
If a series is not stationary, you can differentiate it to make it
stationary.
If Yt is the original series, then ∆Yt = Yt − Yt−1 and work with this
new series of ∆Yi . This is called as lag.
Note that some series may not be stationary even after the first
differentiation. You then need to go for the second differentiation. If
even the second differentiation doesnt work, you may have to try
some other transformation logarithm.
Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 14 / 34
Time Series
White noise
AR Process
Consider the time series given by Yt , Yt−1 , ....Yt−k .
The auto-regressive process is denoted by AR(p), where p is the order
of the auto-regressive process.
In the AR process, the current values of the series are a factor of
previous values.
p determines on how many previous values the current value of the
series depends.
In an AR(1) process then Yt ) = a1 Yt−1 + t where
a1 : Quantified impact of Yt−1 on Yt .
t : white noise.
Similarly in an AR(2) process then Yt = a1 Yt−1 + a2 Yt−2 t where
a1 : Quantified impact of Yt−1 on Yt .
a2 : Quantified impact of Yt−2 on Yt .
t : white noise.
Hrishikesh Khaladkar,Fergusson College Time Series (ARIMA) April 11, 2018 17 / 34
Time Series
AR Process
MA Process
MA Process
ARMA Process
ARMA Process
Vision Test
Assume the patient is literate.
Based on some tests, identify nearsightedness or farsightedness and
get a rough estimate of eyesight.
Estimate the exact eyesight by trying various lenses.
Use the test results to give the prescription.
Box Jenkins Approach
Assume that the time series is stationary,if not make it stationary.
Based on ACF and PACF plots , identify whether the model is an AR
or MA or ARMA process.
Estimate the parameters such as a1 , a2 ...ap and b1 , b2 ...bq .
Use the final model for forecasting.
ARIMA Processess
On the x-axis you have the lag values, while the y-axis has the
autocorrelation values. The graph might vary based on the type of the
series.
On the x-axis you have the lag values, while the y- axis has the partial
auto correlation values.As is the case with ACF, a PACF graph might vary
based on the type of series.
Ideally you would like to have the error be zero or less than 5 percent.The
following are some measures of accuracy.
Yi denote the actual value.
Ŷi denote the expected value
Pn
1 Mean absolute deviation (MAD)= i=1 |Yi − Ŷi |
n
100 ni=1 |Yi − Ŷi |
P
2 Mean absolute percent error(MAPE)=
n Yi
Pn 2
3 Mean square error (MSE)= i=1 (Yi − Ŷi )
n
4 Another related measure is root mean square error, which is
√
RMSE = MSE .
It is generally a good practice to keep 5 to 10 percent of the sample data
for validation purposes