Project6 Time Series
Project6 Time Series
Project6 Time Series
Forecasting
m
Submitted by: Sumit Sinha
What do you observe? Which components of the time series are present in this
dataset?
As advised in the project description we have downloaded the forecast package in
R. Australian monthly gas production dataset : gas is part of this package. Once
we ran the library “forecast” we checked the type of the ‘gas’ data. We found that
this dataset is already in the time series object format.
I have read the ‘gas’ into ‘ts_gas’ as time series object.
Code:
Output:
Now, we try to get a basic understanding of the data, printing few initial rows,
getting ‘starting and ending period’ of the data, understanding frequency of the
data, and the summary.
Code:
Output:
From above output we can say that:
Starting period of the data: Jan, 1956
Ending period of data : Aug, 1995
This one is the monthly data as frequency is 12.
Now, we will plot the data to get better understanding of the intrinsic
characteristics of the data.
First, we plot the entire time series, and then we plotted the monthly distribution
of gas production along with median production of particular months. We also
plotted the seasonality from where we will understand if this series contain any
seasonality components in it and finally we plotted the multi frame chart of
decomposed components like seasonality, trend, and random.
Code:
0utput1:
From the above graph its visible that gas production has increase over years
although the amplified increasing trend started year 1969. Visibly the seasonality
effect in the time series is also observed at the sudden drop at the start of each
year and then going up once again. The strong seasonal pattern that increases in
size as we the year passes.
Output2:
Monthly gas production chart clearly shows in the month of on average every July
the gas production hit the high and then by December or January it comes down
to the lowest. This depicts the seasonal pattern once again. The mean gas
production and the variance (Range) also differ by months along with trend that
we already observed establish the non stationary characteristics of the series.
Output3:
From the above seasonal plot chart we see that in the initial years the seasonal
pattern was subtle but after 1969 the seasonal effect amplified considerably,
July/August being the highest production month and Dec/Jan being the lowest
production month in a year.
Output4:
From the above multi frame chart of decomposed components we see a clear
increasing trend in the series. A visible seasonal pattern is also observed from
seasonal chart.
Output:
Is the time series Stationary? Inspect visually as well as conduct an ADF test? Write
down the null and alternate hypothesis for the stationarity test? De-seasonalise the
series if seasonality is present?
To check the stationarity we will first produce the Multi frame chart of
decomposed components of the series.
It is evident from the Trend chart that the series have an increasing trend which
essentially amplified after 1969/1970. Also if we see that from the observed
series that with increasing years the monthly volatility of production has gone up.
This proves the as we go pass years the mean and the variance is not same over
the years/months.
We also pot a monthly boxplot of the data.
Code:
Output:
Monthly gas production chart clearly shows ,on average, every July the gas
production hit the high and then by December or January it comes down to the
lowest. This depicts the seasonal pattern once again. The mean gas production
and the variance (Range) also differ by months along with trend that we already
observed establish the non stationary characteristics of the series.
ADF test:
Now we will proceed to do the ADF test to find out the stationarity of the data.
ADF tests the null hypothesis that a unit root is present in time series.
Hypothesis (H0): If accepted, it suggests the time series has a unit root, meaning it
is non-stationary. It has some time dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time
series does not have a unit root, meaning it is stationary.
p-value > 0.05: Accept H0, the data has a unit root and is non-stationary.
p-value ≤ 0.05: Reject H0. the data does not have a unit root and is stationary.
Code:
Output:
ADF test statistics confirms that p value is more than 0.05 and for this reason we
cannot reject the null hypothesis H0 which in turn means that data has unit root
and is non stationary.
Next we will de-seasonalise the series.
In the above exercises we have exhibited with different plots that in the initial
years the seasonal pattern was subtle but after 1969 the seasonal effect amplified
considerably, July/August being the highest production month and Dec/Jan being
the lowest production month in a year.
To de-seasonalise we will first decompose the series and then take out the
seasonal effect from the data. We will plot the original series and de-seasonalise
series on the same chart to compare them.
Code:
Output:
Develop an ARIMA Model to forecast for next 12 periods. Use both manual and
auto.arima (Show & explain all the steps).
The next step is to building ARIMA models (manual and auto).First we will find out
the auto correlation and partial auto correlation of the series.
Code:
Output:
The function Acf computes an estimate of the autocorrelation function of a
(possibly multivariate) time series. Function Pacf computes an estimate of the
partial autocorrelation function of a (possibly multivariate) time series.
As multiple lags are significant, it is not possible to tentatively identify the
numbers of AR and/or MA terms that are needed.
Next we will do the differencing of first order for the trend removal and then try
to check whether that makes the series stationary.
Code:
Output1:
Output2:
By inspecting the series after 1st order differencing we see that the series has
become more or less stationary. Event the ADF test also gives us the p value less
than 0.05.
Thus we can reject the null hypothesis (H0) which in turn tells that the data does
not have a unit root and is stationary.
So as we didn’t get any conclusive p,q values, we will go with p=1, and q=1 and as
we have seen 1st order differencing works well on the series, we will take d=0 for
the manual ARIMA.
But before that we will divide the data into training and test set. We will take the
data till December 1993 as training set and then the rest of the data which is
much smaller as test data.
Code:
Next we will fit the manual ARIMA model with the p,q,d values already decided
and the auto ARIMA model as well.
Code:
Accuracy metrics:
The MAPE (Mean Absolute Percentage Error) of manual ARIMA is greater than
that of auto ARIMA model for training and test data both, which established the
fact that auto ARIMA model is much accurate than the manual one.
All the other error metrics tells the same thing about these two models.