Project6 Time Series

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Project 6 – Time Series

Forecasting

m
Submitted by: Sumit Sinha

Program & Group: PGPBABIOnline May19_A


Read the data as a time series object in R. Plot the data.

What do you observe? Which components of the time series are present in this
dataset?
As advised in the project description we have downloaded the forecast package in
R. Australian monthly gas production dataset : gas is part of this package. Once
we ran the library “forecast” we checked the type of the ‘gas’ data. We found that
this dataset is already in the time series object format.
I have read the ‘gas’ into ‘ts_gas’ as time series object.
Code:

Output:

Now, we try to get a basic understanding of the data, printing few initial rows,
getting ‘starting and ending period’ of the data, understanding frequency of the
data, and the summary.
Code:

Output:
From above output we can say that:
Starting period of the data: Jan, 1956
Ending period of data : Aug, 1995
This one is the monthly data as frequency is 12.
Now, we will plot the data to get better understanding of the intrinsic
characteristics of the data.
First, we plot the entire time series, and then we plotted the monthly distribution
of gas production along with median production of particular months. We also
plotted the seasonality from where we will understand if this series contain any
seasonality components in it and finally we plotted the multi frame chart of
decomposed components like seasonality, trend, and random.
Code:

0utput1:
From the above graph its visible that gas production has increase over years
although the amplified increasing trend started year 1969. Visibly the seasonality
effect in the time series is also observed at the sudden drop at the start of each
year and then going up once again. The strong seasonal pattern that increases in
size as we the year passes.

Output2:

Monthly gas production chart clearly shows in the month of on average every July
the gas production hit the high and then by December or January it comes down
to the lowest. This depicts the seasonal pattern once again. The mean gas
production and the variance (Range) also differ by months along with trend that
we already observed establish the non stationary characteristics of the series.
Output3:

From the above seasonal plot chart we see that in the initial years the seasonal
pattern was subtle but after 1969 the seasonal effect amplified considerably,
July/August being the highest production month and Dec/Jan being the lowest
production month in a year.
Output4:
From the above multi frame chart of decomposed components we see a clear
increasing trend in the series. A visible seasonal pattern is also observed from
seasonal chart.

What is the periodicity of dataset?


To get the periodicity of the data we have installed the “xts” package and ran the
“xts” library.
Code:

Output:

With the periodicity function we get that periodicity is monthly.

Is the time series Stationary? Inspect visually as well as conduct an ADF test? Write
down the null and alternate hypothesis for the stationarity test? De-seasonalise the
series if seasonality is present?

To check the stationarity we will first produce the Multi frame chart of
decomposed components of the series.
It is evident from the Trend chart that the series have an increasing trend which
essentially amplified after 1969/1970. Also if we see that from the observed
series that with increasing years the monthly volatility of production has gone up.
This proves the as we go pass years the mean and the variance is not same over
the years/months.
We also pot a monthly boxplot of the data.
Code:

Output:
Monthly gas production chart clearly shows ,on average, every July the gas
production hit the high and then by December or January it comes down to the
lowest. This depicts the seasonal pattern once again. The mean gas production
and the variance (Range) also differ by months along with trend that we already
observed establish the non stationary characteristics of the series.

ADF test:
Now we will proceed to do the ADF test to find out the stationarity of the data.
ADF tests the null hypothesis that a unit root is present in time series.
Hypothesis (H0): If accepted, it suggests the time series has a unit root, meaning it
is non-stationary. It has some time dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time
series does not have a unit root, meaning it is stationary.
p-value > 0.05: Accept H0, the data has a unit root and is non-stationary.
p-value ≤ 0.05: Reject H0. the data does not have a unit root and is stationary.
Code:

Output:

ADF test statistics confirms that p value is more than 0.05 and for this reason we
cannot reject the null hypothesis H0 which in turn means that data has unit root
and is non stationary.
Next we will de-seasonalise the series.
In the above exercises we have exhibited with different plots that in the initial
years the seasonal pattern was subtle but after 1969 the seasonal effect amplified
considerably, July/August being the highest production month and Dec/Jan being
the lowest production month in a year.
To de-seasonalise we will first decompose the series and then take out the
seasonal effect from the data. We will plot the original series and de-seasonalise
series on the same chart to compare them.
Code:

Output:
Develop an ARIMA Model to forecast for next 12 periods. Use both manual and
auto.arima (Show & explain all the steps).

Report the accuracy of the model.

The next step is to building ARIMA models (manual and auto).First we will find out
the auto correlation and partial auto correlation of the series.
Code:

Output:
The function Acf computes an estimate of the autocorrelation function of a
(possibly multivariate) time series. Function Pacf computes an estimate of the
partial autocorrelation function of a (possibly multivariate) time series.
As multiple lags are significant, it is not possible to tentatively identify the
numbers of AR and/or MA terms that are needed.
Next we will do the differencing of first order for the trend removal and then try
to check whether that makes the series stationary.
Code:

Output1:

Output2:
By inspecting the series after 1st order differencing we see that the series has
become more or less stationary. Event the ADF test also gives us the p value less
than 0.05.
Thus we can reject the null hypothesis (H0) which in turn tells that the data does
not have a unit root and is stationary.
So as we didn’t get any conclusive p,q values, we will go with p=1, and q=1 and as
we have seen 1st order differencing works well on the series, we will take d=0 for
the manual ARIMA.
But before that we will divide the data into training and test set. We will take the
data till December 1993 as training set and then the rest of the data which is
much smaller as test data.
Code:

Next we will fit the manual ARIMA model with the p,q,d values already decided
and the auto ARIMA model as well.
Code:

As we have use the de-seasonalised data, we have kept seasonal=FALSE as


parameter in auto ARIMA model.
Next we would forecast for 12 months and see how is the accuracy of these
individual models in predicting the training data and as well as test data.
Code:

Output 1 (forecast of manual ARIMA) :

Output 2 (forecast of auto ARIMA) :


The output of auto ARIMA seems much more accurate and realistic than the
manual ARIMA in the sense that auto ARIMA model forecast show us the seasonal
effect that present on the original data, but manual ARIMA forecast is just flat
showing the average trend.
We will nevertheless compute the accuracy metrics which may support our visible
understanding.

Accuracy metrics:

The MAPE (Mean Absolute Percentage Error) of manual ARIMA is greater than
that of auto ARIMA model for training and test data both, which established the
fact that auto ARIMA model is much accurate than the manual one.
All the other error metrics tells the same thing about these two models.

You might also like