Suresh-Rose Time Series Forecasting Project Report
Suresh-Rose Time Series Forecasting Project Report
Suresh-Rose Time Series Forecasting Project Report
Series Forecasting
Project Report
Suresh Veeraraghavan
Rose Wine Sale Time Series Forecasting....................................................................................................... 8
Executive Summary....................................................................................................................................... 8
Data Dictionary ......................................................................................................................................... 8
1 Read the data as an appropriate time series data and plot the data. .................................................. 8
1.1 Dataset Sample: ............................................................................................................................ 8
1.2 Missing data .................................................................................................................................. 9
2 Perform appropriate Exploratory Data Analysis to understand the data & also perform
decomposition. ........................................................................................................................................... 10
2.1 Five Point Summary .................................................................................................................... 10
2.2 Dataset Info ................................................................................................................................. 10
2.3 Year wise Box Plot ....................................................................................................................... 10
2.4 Month wise Box Plot ................................................................................................................... 11
2.5 Month plot with median ............................................................................................................. 11
2.6 Pivot table view ........................................................................................................................... 12
2.7 Empirical Distribution ................................................................................................................. 13
2.8 Average and Sale percentage change ......................................................................................... 14
2.9 Decomposition of Time Series – Additive ................................................................................... 14
2.10 Decomposition of Time Series – Multiplicative .......................................................................... 16
3 Split the data into training and test. The test data should start in 1991. ........................................... 17
3.1 Sample of data split .................................................................................................................... 17
4 Build all the exponential smoothing models ...................................................................................... 18
4.1 Linear Regression Model............................................................................................................. 18
4.1.1 Test RMSE – Linear Regression ........................................................................................... 19
4.2 Naïve Forecast............................................................................................................................. 19
4.2.1 Test RMSE – Naïve Model ................................................................................................... 20
4.3 Simple Average ........................................................................................................................... 20
4.3.1 Test RMSE – Simple Average Model ................................................................................... 21
4.4 Moving Average (MA) ................................................................................................................. 22
4.4.1 Test RMSE – Moving Average ............................................................................................. 25
4.5 Simple Exponential Smoothing (SES) - ETS(A, N, N) .................................................................... 26
4.5.1 Smoothing parameters ....................................................................................................... 27
4.5.2 Test RMSE – SES .................................................................................................................. 27
4.6 Double Exponential Smoothing - ETS(A, A, N) ............................................................................ 28
4.6.1 DES Smoothing parameters ................................................................................................ 28
4.6.2 RMSE Test – DES ................................................................................................................. 29
4.7 Holt Winter's linear method with additive errors (Triple Exponential Additive Smoothing) - ETS
(A, A, A) ................................................................................................................................................... 30
4.7.1 Triple Exponential Additive Smoothing parameters ........................................................... 30
4.7.2 TEST RMSE – Triple Exponential Additive Smoothing ......................................................... 32
4.8 Holt Winter's linear method – multiplicative (TES) – ETS (A, A, M) ............................................ 33
4.8.1 Parameters .......................................................................................................................... 33
4.8.2 TEST RMSE – TES Multiplicative .......................................................................................... 34
4.9 Holt Winter's linear method with additive errors - Using Damped Trend - ETS(A, A, A)............ 34
4.9.1 TES additive – Damped Trend parameters ......................................................................... 35
4.9.2 TEST RMSE – TES additive Damped Trend .......................................................................... 36
4.10 Holt Winter's linear method - multiplicative - using DAMPED TREND - ETS(A, A, M) ................ 37
4.10.1 TES multiplicative – Damped Trend parameters ................................................................ 38
4.10.2 TEST RMSE – TES multiplicative Damped Trend ................................................................. 39
4.11 Inference/Conclusion based on the model build so far: ............................................................. 40
5 Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the new data for stationarity and
comment. Note: Stationarity should be checked at alpha = 0.05 .............................................................. 41
5.1 Data Stationarity verification: ..................................................................................................... 41
5.1.1 Dicky Fuller test - check for stationarity of the time series ................................................ 41
5.1.2 One order difference result ................................................................................................ 43
5.1.3 Time series plot before and after one order difference ..................................................... 43
6 Build an automated version of the ARIMA/SARIMA model in which the parameters are selected
using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the
test data using RMSE. ................................................................................................................................. 44
6.1.1 ACF and PACF before one order difference on full data ..................................................... 45
6.1.2 ACF and PACF after performing one order difference on full data .................................... 46
6.1.3 ACF and PACF for Train dataset with one order difference ................................................ 47
6.2 ARIMA Automated ...................................................................................................................... 48
6.2.2 RMSE – ARIMA Automated ................................................................................................. 50
6.2.3 Automated ARIMA Prediction............................................................................................. 51
6.3 SARIMA Automated .................................................................................................................... 51
6.3.2 Predicted sample test data ................................................................................................. 54
6.3.3 RMSE – Automated SARIMA ............................................................................................... 55
6.3.4 Automated SARIMA prediction ........................................................................................... 55
7 Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE. ..................................................................................... 56
7.1 ACF & PACF plot with one difference ......................................................................................... 56
7.2 ARIMA Manual Model (2,1,2) ..................................................................................................... 57
7.2.1 RMSE – Manual ARIMA ....................................................................................................... 58
7.2.2 Manual ARIMA prediction................................................................................................... 59
7.3 SARIMA Manual Model ............................................................................................................... 59
7.3.1 ACF and PACF with difference of 6 and diff in train dataset to identify the P and Q ........ 61
7.3.2 Manual SARIMA Model1: (2,1,2) (3, 0, 1, 12) ..................................................................... 62
7.3.3 Manual SARIMA Model 2: (2,1,2) (0, 0, 1, 12) .................................................................... 64
7.3.4 Manual SARIMA Model 3: (3,1,2) (2, 0, 1, 12) .................................................................... 65
7.3.5 RMSE – Manual SARIMA Models ........................................................................................ 67
7.3.6 SARIMA Models Prediction ................................................................................................. 67
8 Build a table with all the models built along with their corresponding parameters and the respective
RMSE values on the test data. .................................................................................................................... 69
9 Based on the model-building exercise, build the most optimum model(s) on the complete data and
predict 12 months into the future with appropriate confidence intervals/bands. .................................... 71
9.1 Best Model fitting ....................................................................................................................... 71
9.2 12 months prediction.................................................................................................................. 72
10 Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales................................................................................................. 73
10.1 Findings on the dataset ............................................................................................................... 73
10.2 Comments on the model build ................................................................................................... 75
10.3 Suggestion based on the analysis we performed ....................................................................... 75
List of Figures
Figure 1 –First 5 rows .................................................................................................................................... 8
Figure 2- Last 5 rows ..................................................................................................................................... 9
Figure 3 - Time Series of Rose Wine Sale ...................................................................................................... 9
Figure 4 - Missing data .................................................................................................................................. 9
Figure 5 - Five Point Summary .................................................................................................................... 10
Figure 6 - Dataset Info................................................................................................................................. 10
Figure 7 - Year wise Box plot ....................................................................................................................... 10
Figure 8- Month wise Box plot .................................................................................................................... 11
Figure 9 - Month wise with media .............................................................................................................. 11
Figure 10 - Pivot table view......................................................................................................................... 12
Figure 11 - Empirical_distribution............................................................................................................... 13
Figure 12 - 2.8 Average and Sale percentage change ................................................................................. 14
Figure 13 - Decomposition - Additive ......................................................................................................... 14
Figure 14 - Decomposition - Additive values .............................................................................................. 15
Figure 15 - Decomposition Additive Residual ............................................................................................. 15
Figure 16 - Decomposition - Multiplicative................................................................................................. 16
Figure 17 - Residuals - Multiplicative decomposition ................................................................................. 16
Figure 18 - Train and Test Split graph ......................................................................................................... 17
Figure 19 - Train and Test Split sample records .......................................................................................... 18
Figure 20- Linear Regression ....................................................................................................................... 19
Figure 21 - Linear Regression test RMSE..................................................................................................... 19
Figure 22 - Naive Forecast........................................................................................................................... 20
Figure 23 - Naive test RMSE ........................................................................................................................ 20
Figure 24 - Simple Average ......................................................................................................................... 21
Figure 25 - Simple Average RMSE ............................................................................................................... 21
Figure 26 - Moving Average 2, 4, 6 and 9 point .......................................................................................... 22
Figure 27 - Moving Average 2 point ............................................................................................................ 23
Figure 28 - Moving Average 4 point ............................................................................................................ 23
Figure 29 - Moving Average 6 point ............................................................................................................ 24
Figure 30 - Moving Average 9 point ............................................................................................................ 24
Figure 31 - Moving Average RMSE .............................................................................................................. 25
Figure 32 - SES Parameters ......................................................................................................................... 27
Figure 33 - data prediction using SES .......................................................................................................... 27
Figure 34 - SES test data prediction graph .................................................................................................. 27
Figure 35 - SES RMSE................................................................................................................................... 27
Figure 36 - DES Smoothing Parameters ...................................................................................................... 28
Figure 37 - DES Smoothing graph ............................................................................................................... 29
Figure 38 - DES RMSE .................................................................................................................................. 29
Figure 39 - 4.7.1 Triple Exponential Additive Smoothing parameters ................................................... 30
Figure 40 – Smoothing models SES, DES & TES .......................................................................................... 31
Figure 41 - Triple Exponential Additive Smoothing .................................................................................... 31
Figure 42 - RMSE TES .................................................................................................................................. 32
Figure 43 - Holt Winter's Parameters – Multiplicative ............................................................................... 33
Figure 44 - Prediction of TES Multiplicative ................................................................................................ 34
Figure 45 - RMSE – TES Multiplicative ........................................................................................................ 34
Figure 46 - TES Damped Trend parameters ................................................................................................ 35
Figure 47 - RMSE TES additive Damped Trend ........................................................................................... 36
Figure 48 - 4.10.1 TES multiplicative – Damped Trend parameters ........................................................ 38
Figure 49 - RMSE – TES multiplicative Damped Trend................................................................................ 39
Figure 50 - 2 Point Trailing Moving Average.............................................................................................. 40
Figure 51 - Dickey Fuller Test ...................................................................................................................... 41
Figure 52 - Rolling Mean and Standard Deviation ...................................................................................... 42
Figure 53 - Dickey Fuller test after one order diff....................................................................................... 43
Figure 54 - Rolling Mean and std dev after 1 order diff.............................................................................. 43
Figure 55 - Time series before and after 1 order diff.................................................................................. 43
Figure 56 - ACF Full data ............................................................................................................................. 45
Figure 57 - PACF Full data ........................................................................................................................... 45
Figure 58 - ACF Full data with one order diff .............................................................................................. 46
Figure 59 - PACF Full data with one order diff ............................................................................................ 46
Figure 60 - ACF Train data - with one order difference .............................................................................. 47
Figure 61 - ARIMA automated parameters ................................................................................................. 48
Figure 62 - top 5 from ARIMA automated model ....................................................................................... 48
Figure 63 - Auto ARIMA Plot ....................................................................................................................... 50
Figure 64 - RMSE ARIMA Automated .......................................................................................................... 50
Figure 65 - Auto ARIMA 2.1.2 ..................................................................................................................... 51
Figure 66 - SARIMA Automated parameters .............................................................................................. 52
Figure 67- Top 5 best model - lowest AIC scores ........................................................................................ 52
Figure 68 - Automated SARIMA Result ....................................................................................................... 53
Figure 69 – Automated SARIMA Plot .......................................................................................................... 54
Figure 70 - SARIMA sample predicted test data ......................................................................................... 54
Figure 71 - RMSE Auto SARIMA .................................................................................................................. 55
Figure 72 - SARIMA test prediction ............................................................................................................. 55
Figure 73 - Manual ARIMA results .............................................................................................................. 57
Figure 74 - Manual ARIMA plot................................................................................................................... 58
Figure 75 - RMSE Manual ARIMA ................................................................................................................ 58
Figure 76 -Manual ARIMA prediction ......................................................................................................... 59
Figure 77 - Full data plot with diff 6 + diff................................................................................................... 59
Figure 78 - Mean and std Dev plot with diff 6 + diff ................................................................................... 60
Figure 79 - Dickey Fuller test for diff 6 ........................................................................................................ 60
Figure 80 - ACF Train set with diff 6 + diff................................................................................................... 61
Figure 81 - ACF Train set with diff 6 + diff................................................................................................... 61
Figure 82 - Manual SARIMA Model 1 results .............................................................................................. 62
Figure 83 - SARIMA Model 1 Plot ................................................................................................................ 63
Figure 84 - Manual SARIMA Model 2 results .............................................................................................. 64
Figure 85 - SARIMA Model 2 Plot ................................................................................................................ 65
Figure 86 - Manual SARIMA Model 3 results .............................................................................................. 65
Figure 87 - SARIMA Model 3 Plot ................................................................................................................ 66
Figure 88 - RSME Manual SARIMA Models ................................................................................................. 67
Figure 89 - SARIMA Prediction Model 1...................................................................................................... 67
Figure 90 - SARIMA Prediction Model 2...................................................................................................... 68
Figure 91 - SARIMA Prediction Model 3...................................................................................................... 68
Figure 94 - Forecast of next 12 months ...................................................................................................... 72
Figure 95 - 12 Months prediction ............................................................................................................... 72
Figure 96 - Time series plot ......................................................................................................................... 73
Figure 97 - Month wise plot ........................................................................................................................ 74
List of Tables
No. Tables Page No
1 Table 1 – All Models 69
2 Table 2 – Top 5 Best Models 74
Rose Wine Sale Time Series Forecasting
Executive Summary
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analysed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century. In this document will
be going through the business report Rose Wine Sale Time Series Forecasting.
Data Dictionary
Rose dataset has two column, Year-Month and corresponding sale quantity of Rose wine from
the year 1980 to 1995
Figure 2- Last 5 rows
Out of 187 records, 2 records are null. INTERPOLATION is used to impute the 2 missing values
2 Perform appropriate Exploratory Data Analysis to
understand the data & also perform decomposition.
Rose data set has 187 rows. Missing value present in the dataset are treated.
From the above Year wise box plot it is clearly visible most of the year has outliers
2.4 Month wise Box Plot
From the above Month wise box plot across the year it is clearly visible June, July, August and
September month has outliers
Across the year December month shows the highest sale
April month shows the lowest sale across the year
Through this box plot we could understand seasonality present in the Rose dataset
2.6 Pivot table view
2.7 Empirical Distribution
Figure 11 - Empirical_distribution
This particular graph tells us what percentage of data points refer to what number of Sales.
85% of the sales are below 115
Maximum sales is close to 260
2.8 Average and Sale percentage change
The above two graphs tells us the Average 'Rose Wine Sales' and the Percentage change of ‘Rose
Wine Sales' with respect to the time.
Above decomposition shows the downward trend presents in the dataset
Strong seasonality is present in the Rose wine sale dataset
Few residual are high and most of residuals are stays near to 0
We see that the residuals are located around 0 from the plot of the residuals in the
2.10 Decomposition of Time Series – Multiplicative
yt = Trend * Seasonalit y * Residual
For the multiplicative series, we see that a lot of residuals are located around 1
Multiplicative decomposition is fits better than the additive decomposition for the Rose dataset
3 Split the data into training and test. The test data
should start in 1991.
Rose dataset is split into train and test at the year 1991
Sales count from 1980 to 1990 are taken has train dataset
Sales count from 1991 to 1995 are taken has test dataset
Figure 19 - Train and Test Split sample records
In time series data, the dependent variable is a variable that changes over time, and the independent
variable(s) are typically other time-varying variables that may influence the dependent variable. Linear
regression can be a useful tool for modeling time series data
The linear regression equation for a time series data can be written as:
where y(t) is the dependent variable at time t, x1(t), x2(t), ..., xk(t) are the k independent variables at time
t, β0, β1, β2, ..., βk are the corresponding coefficients or parameters to be estimated, and ε(t) is the error
term at time t.
Figure 20- Linear Regression
The above graph makes it quite evident that the Linear regression doesn't do well on the test
dataset. Linear regression forecast is represented in green bar
For this particular naive model, we say that the prediction for tomorrow is the same as today and the
prediction for day after tomorrow is tomorrow and since the prediction of tomorrow is same as today,
therefore the prediction for day after tomorrow is also today.
𝑦̂ 𝑡+1=𝑦𝑡
Figure 22 - Naive Forecast
The above graph makes it quite evident that the Naïve model doesn't do well on the test dataset.
Naïve forecast is represented in green bar
To use this method, you would simply calculate the average of the historical data and use it as a forecast
for all future time periods.
Figure 24 - Simple Average
The above graph makes it quite evident that the Simple Average model doesn't do well on the
test dataset. Simple Average forecast is represented in green bar
4.4 Moving Average (MA)
Moving Average (MA) is a time series forecasting method that involves calculating the average of a fixed
number of past observations to forecast future values. The "moving" part of the name refers to the fact
that the window of observations used to calculate the average moves forward in time with each new
For various intervals, rolling means (also known as moving averages) will be computed. The highest
accuracy (or lowest error) over here can be used to calculate the ideal interval.
Figure 27 - Moving Average 2 point
Figure 29 - Moving Average 6 point
The above graphs makes it quite evident that 4 point, 6 point and 9 point moving average model
doesn't do well on the test dataset.
2 point moving average performs better than the other 3 moving averages
Let’s check the RMSE to make sure 2 point moving average is better than other moving averages
4.4.1 Test RMSE – Moving Average
Test RMSE score for 2 point moving average is lesser than the other moving averages
Lesser RMSE value gives best performance in test dataset
4 point moving average is the second best in the above plotted moving averages
Exponential Smoothing methods
Exponential smoothing is a family of time series forecasting methods that involves giving more weight to
recent observations while decreasing the weight of older observations exponentially over time. This
approach is based on the assumption that recent observations are more informative than older ones and
that trends and patterns in the data may change over time.
Following Exponential Smoothing Models will be built to check perform of the model in test dataset
• Double Exponential Smoothing with Additive Errors, Additive Trends – ETS (A, A, N)
α is the smoothing parameter, also known as the smoothing constant, which determines the weight given
to the most recent observation. It ranges from 0 to 1.
4.5.1 Smoothing parameters
For Simple exponential smoothing value of alpha parameter is consider has 0.098750
The above graph makes it quite evident that the SES model doesn't do well on the test dataset.
SES forecast is represented in green bar
Alpha parameter value is 0.098750 and the test RMSE value is 36.80
Double Exponential Smoothing uses two equations to forecast future values of the time series, one for
forecasting the short term average value or level and the other for capturing the trend.
Here, 𝛼 and 𝛽 are the smoothing constants for level and trend, respectively, 0 < 𝛼 < 1 and 0 < 𝛽 < 1.
The forecast at time t + 1 is given by
Parameters are auto fitted as shown in the above figure; alpha as 1.490116e-08, Beta=1.661039e-
Figure 37 - DES Smoothing graph
The above graph makes it quite evident that the DES model doesn't do well on the test dataset.
SES forecast is represented in green bar and DES forecast is represented in red bar
4.7 Holt Winter's linear method with additive errors (Triple
Exponential Additive Smoothing) - ETS (A, A, A)
Holt-Winters smoothing is a statistical technique used to forecast time-series data. It is an extension of
simple exponential smoothing and is used to model data that exhibits trends and seasonality.
The Holt-Winters method involves smoothing the data with three separate smoothing factors:
Level smoothing: This factor, alpha, smooths out the random noise in the data and captures the overall
trend of the time series.
Trend smoothing: This factor, beta, captures the rate of change of the time series trend over time.
Seasonality smoothing: This factor, gamma, captures the seasonal variations in the data over a fixed
period of time.
As specified above there are many seasonal parameters are considered in Holt Winter’s model
Parameter alpha is 0.089541 and beta is 0.000240 and gamma 0.003467
Figure 40 – Smoothing models SES, DES & TES
The above graph makes it quite evident that the TES model fits well on the test dataset. SES and
DES doesn’t fill well and they are in green and red bar respectively.
Triple Exponential Additive Smoothing additive model predict well on test dataset
Level smoothing: This factor, alpha, smooths out the random noise in the data and captures the
overall trend of the time series
Trend smoothing: This factor, beta, captures the rate of change of the time series trend over time
Seasonality smoothing: This factor, gamma, captures the seasonal variations in the data over a
fixed period of time
4.7.2 TEST RMSE – Triple Exponential Additive Smoothing
TES RMSE has lesser value than the model we built so for and it reciprocate in the test prediction.
Test prediction graph comes closer to the test dataset
Triple Exponential Additive Smoothing has performed the best on the test as expected since the data had
both trend and seasonality. This model could be the best model.
But we see that triple exponential smoothing is under forecasting. Let us try to tweak some of the
parameters in order to get a better forecast on the test set.
4.8 Holt Winter's linear method – multiplicative (TES) – ETS (A, A, M)
4.8.1 Parameters
As specified above there are many seasonal parameters are considered in Holt Winter’s model
Parameter alpha is 0.071511, beta is 0.045292 and gamma is 0.000072
Figure 44 - Prediction of TES Multiplicative
By seeing the above graph we couldn’t conclude TES multiplicative model performs well in test
data. We need to compare RMSE to conclude which TES model performs well
By reviewing the above RMSE values we see that the multiplicative seasonality model has not
done that well when compared to the additive seasonality Triple Exponential Smoothing model.
RMSE values of TES multiplicative model 20.16 is higher than the TES additive 14.24
4.9 Holt Winter's linear method with additive errors - Using Damped
Trend - ETS(A, A, A)
Damped trend additive method is a forecasting technique used to predict time-series data that exhibit a
trend, where the trend is expected to decrease or dampen over time. The method is a variation of the
additive method and involves adding a damping factor to the trend component.
The damped trend additive method can be represented by the following equation:
D is the damping factor, which is a value between 0 and 1 that reduces the magnitude of the trend over
From the above parameters list we can witness the various seasonal parameters are present and
Alpha= 0.073686, Beta= 0.009798, Gamma= 0.073301,damping_trend=0 0.975626
We couldn't infer from the preceding graph that the TES additive Damped Trend model performed
well in test data. In order to determine which TES model performs best, we must compare RMSE.
TES additive damped trend model performed well on the test prediction.
Among the various tunings we did in TES models “Additive Damped Trend” has the lowest RMSE
4.10 Holt Winter's linear method - multiplicative - using DAMPED
Damped trend multiplicative method is a forecasting technique used to predict time-series data that
exhibit a trend, where the trend is expected to decrease or dampen over time. The method is a variation
of the multiplicative method and involves adding a damping factor to the trend component.
The damped trend multiplicative method can be represented by the following equation:
D is the damping factor, which is a value between 0 and 1 that reduces the magnitude of the trend over
4.10.1 TES multiplicative – Damped Trend parameters
From the above parameters list we can witness the various seasonal parameters are present and
Alpha= 7.339816e-07, Beta= 3.874478e-07, Gamma= 5.495855e-07,damping_trend= 9.795710e-01
We couldn't infer from the preceding graph that the TES multiplicative Damped Trend model
performed well in test data. In order to determine which TES model performs best, we must
compare RMSE.
TES multiplicative damped trend model performed well on the test prediction than the additive
damped trend model
4.11 Inference/Conclusion based on the model build so far:
So far we have seen 13 models performance. Among the 13 model “2 point Trailing Moving Average”
performed well. It has the lowest RMSE value 11.53
Best Model:
5 Check for the stationarity of the data on which the
model is being built on using appropriate statistical tests
and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to
make it stationary. Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05
Stationarity, also known as stationarity assumption, is a fundamental concept in time series analysis that
refers to the property of a time series data where the statistical properties of the data remain constant
over time. In other words, a stationary time series has a constant mean, constant variance, and constant
autocorrelation structure over time.
The Augmented Dickey-Fuller test is a unit root test which determines whether there is a unit root and
subsequently whether the series is non-stationary.
𝐻0: The Time Series has a unit root and is thus non-stationary.
𝐻1: The Time Series does not have a unit root and is thus stationary.
We would want the series to be stationary for building ARIMA models and thus we would want the p-
value of this test to be less than the 𝛼 value.
Differencing will be applied if the time series is identified has non stationary
5.1.1 Dicky Fuller test - check for stationarity of the time series
Above Dickey-Fuller test show the p value is greater than the alpha 0.05, therefore the time series
is not stationary.
Figure 52 - Rolling Mean and Standard Deviation
Time series that are stationary have a constant mean and constant variance, our time series mean
and variance are not constant
To determine if the Time Series evolves to stationary or non-stationary, the difference of order 1
will be used.
5.1.2 One order difference result
After applying one order difference the p value become 0. That is p value is lesser than the alpha 0.05.
Therefore we reject Null Hypothesis and conclude that the time series is stationary
Rolling mean and standard deviation are become constant by performing 1 order difference
5.1.3 Time series plot before and after one order difference
6 Build an automated version of the ARIMA/SARIMA
model in which the parameters are selected using the lowest
Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
ARIMA (Autoregressive Integrated Moving Average) is a popular time series forecasting model that
combines autoregression (AR) and moving average (MA) components with differencing to account for
trend and seasonality in a time series.
Autoregression refers to the use of lagged values of the dependent variable to predict future values.
Moving average refers to the use of the previous forecast errors to predict future values. Differencing
refers to the transformation of a non-stationary time series into a stationary time series by taking the
differences between consecutive observations.
p: the order of the autoregressive component, which refers to the number of lagged values of the
dependent variable used in the model.
d: the degree of differencing, which refers to the number of times the data is differenced to make the
time series stationary.
q: the order of the moving average component, which refers to the number of lagged forecast errors used
in the model.
SARIMA (Seasonal Autoregressive Integrated Moving Average) is an extension of the ARIMA model that
can handle time series data with seasonality. It includes additional seasonal components to account for
repeating patterns in the data, in addition to the autoregressive, integrated, and moving average
components of the ARIMA model.
The parameters of a SARIMA model are denoted as (p, d, q) × (P, D, Q)s, where (p, d, q) are the non-
seasonal ARIMA parameters, (P, D, Q) are the seasonal ARIMA parameters, and s is the seasonal period
(i.e., the number of time periods in a season).
The seasonal AR component (P) models the linear relationship between the series and its seasonal lags,
while the seasonal MA component (Q) models the linear relationship between the forecast errors and
their seasonal lags. The seasonal differencing (D) is used to remove the seasonal trends, similar to the
non-seasonal differencing (d).
As we know by taking one order difference our time series moves to stationary therefore one order
difference will be available while generating Automated ARIMA and SARIMA
6.1.1 ACF and PACF before one order difference on full data
From the above ACF plot we see an insignificant component at 13 after that point there only
one significant point. It is better to perform one order difference
From the above PACF plot we see an insignificant component at 4 after that we have significant
point at 13. It is better to perform one order difference
6.1.2 ACF and PACF after performing one order difference on full data
By following the ACF and PACF graphs we conclude p has insignificant at level 3 and q has insignificant at
level 5
p value is taken from the PACF chart, we could clearly see there is an insignificant component at 5
therefore we are considering 4 as a maximum for p
q value is taken from the ACF chart, we could clearly see there is an insignificant component at 3
therefore we are considering the range of 0 to 2
6.1.3 ACF and PACF for Train dataset with one order difference
One difference ACF the range of q would be 0 to 2 since the third component is insignificant we
are taking q value from 0 to the very first insignificant after 0 is 3
For automated ARIMA we are consider p value range between 0 to 2
6.2 ARIMA Automated
p and q value are in the range between 0 and 2 based on the ACF and PACF charts shown above
We have kept the value of d as 1 and we need to take one difference of the series to make it
The AIC is based on the concept of information entropy and is used to balance the fit of a model to the
data with the complexity of the model. In other words, the AIC attempts to find the simplest model that
best fits the data.
Where L is the likelihood of the data given the model, and k is the number of parameters in the model.
A lower AIC value indicates a better model fit, with the model having the lowest AIC value considered to
be the best fit.
Model with parameter p=0, d=1, q=2 has the least AIC value, therefore we are fitting this model
to predict the test dataset
As we chosen p =0 and q=2 therefore we have 2 parameters moving average and 1 sigma parameter
with lesser value
Coefficient of sigma has a high value, in this case this model will not predict properly.
All the components are significant since p value is below the alpha 0.05
Figure 63 - Auto ARIMA Plot
6.2.3 Automated ARIMA Prediction
To start with P and Q value are assigned in the same range of p and q
p -> 0 to 2
q -> 0 to 2
d -> 1
P -> 0 to 2
Q -> 0 to 2
D -> 0
Seasonality - 12
We are going to use the model with parameter p=0, d=1, q=1, P=2, D=0., Q=2 and seasonality =12
has the least AIC value, therefore we are fitting this model to predict the test dataset
Less usage of parameter gives a best result. Therefore we are considering to test with the
parameters in row 26: p=0, d=1, q=1, P=2, D=0., Q=2 and seasonality =12
Figure 68 - Automated SARIMA Result
Greatest Combination with Least AIC is - p=0, d=1, q=1, P=2, D=0, Q=2 and seasonality =12
As we chosen p =0 and q=1 therefore we have 2 parameters for q moving average and 1 sigma
parameter with higher value
0 components of Auto Regression – components are not significant since p value is greater than
alpha 0.05
2 components Moving Average – this is not a significant component since p value is greater than the
alpha 0.05
ma.L1 – p value 1
ms.L2 – p value 1
seasonal components
Figure 69 – Automated SARIMA Plot
6.3.3 RMSE – Automated SARIMA
Automated SARIMA’s RMSE value is lesser than the Auto ARIMA but we have seen much lesser
RMSE value TES models. Let’s plot prediction on test data and see how Auto SARIMA performs.
The above graph makes it quite evident that the Auto SARIMA model performs better than the
Auto ARIMA on the test dataset. Auto SARIMA forecast is represented in red bar
SARIMA performs well than the ARIMA
RMSE value of SARIMA is 26.93 where in RMSE of ARIMA is 37.31; by seeing we could conclude
Automated SARIMA performs better than Automated ARIMA
7 Build ARIMA/SARIMA models based on the cut-off
points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE.
The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the significant lag before
which the PACF plot cuts-off to 2
The Moving-Average parameter in an ARIMA model is 'q' which comes from the significant lag before the
ACF plot cuts-off to 2.
As we chosen p =2 and q=2 therefore we have 2 parameters for auto regression and 2 parameters
for moving average and 1 sigma parameter with lesser value
2 components of Auto Regression – not significant since p value is greater than the alpha 0.05
2 components Moving Average – not significant since p value is greater than the alpha 0.05
Coefficient of sigma is high, it implies this model will not perform well on the test dataset
Figure 74 - Manual ARIMA plot
Manual ARIMA RMSE score is lesser than the Automated ARIMA, but we can say this is the best
model since Automated SARIMA RMSE value is much lesser than both Auto and Manual ARIMA
7.2.2 Manual ARIMA prediction
The above graph makes it quite evident that the Manual ARIMA model doesn't do well on the test
dataset. It is failed to predict the test data. Manual ARIMA forecast is represented in green bar
Figure 78 - Mean and std Dev plot with diff 6 + diff
Rolling mean and standard deviation are become constant by taking differences of 6 / Tag6 plus
adding another diff (one order differentiation)
p value is lesser the alpha 0.05
By applying differences of 6 and diff the p value become 0. That is p value is lesser than the
alpha 0.05. Therefore we conclude that the time series is stationary
7.3.1 ACF and PACF with difference of 6 and diff in train dataset to identify the P and Q
D value remain 0
Seasonality is taken as 12
7.3.2 Manual SARIMA Model1: (2,1,2) (3, 0, 1, 12)
As we chosen p =2 and q=2 therefore we have 2 parameters for auto regression and 2 parameters for
moving average and 1 sigma parameter with lesser value. P and Q has 3 and 1 respectively
ma.L1 – p value 1 – not significant since p value is greater than the alpha 0.05
ma.L2 – p value 0.93 – not significant since p value is greater than the alpha 0.05
P component
ar.S.L12 – p value 0.03 – significant component since p value is lesser than the alpha 0.05
Coefficient of S.L12 has a major impact on the prediction.
Q component
ma.S.L12 – p value 0.71 – not significant since p value is greater than the alpha 0.05
Figure 83 - SARIMA Model 1 Plot
7.3.3 Manual SARIMA Model 2: (2,1,2) (0, 0, 1, 12)
As we chosen p =2 and q=2 therefore we have 2 parameters for auto regression and 2 parameters for
moving average and 1 sigma parameter with lesser value. P and Q has 0 and 1 respectively
3 components of Auto Regression – components are not significant since p value is greater than
alpha 0.05
2 components of Moving Average – components are not significant since p value is greater than
alpha 0.05
P has 0 components
Q has 2 components
ma.S.L1 – p value 0.07 – not significant since p value is greater than the alpha 0.05
ma.S.L2 – p value 0.84 – not significant since p value is greater than the alpha 0.05
Figure 85 - SARIMA Model 2 Plot
As we chosen p =2 and q=2 therefore we have 2 parameters for auto regression and 2 parameters for
moving average and 1 sigma parameter with lesser value. P and Q has 2 and 1 respectively
ma.L1 – p value 1 – not significant since p value is greater than the alpha 0.05
ma.L2 – p value 1 – not significant since p value is greater than the alpha 0.05
P component
ar.S.L12 – p value 0.00 – significant component since p value is lesser than the alpha 0.05
Coefficient of S.L12 has a major impact on the prediction.
ar.S.L24 – p value 0.00 – significant component since p value is lesser than the alpha 0.05
Coefficient of S.L24 has impact on the prediction
Q component
ma.S.L12 – p value 1 – not significant since p value is greater than the alpha 0.05
7.3.5 RMSE – Manual SARIMA Models
From the above chart we can conclude Model 1 has a lesser RMSE score and MAPR. This model should
perform well on predicting test data. Parameters used for prediction (2,1,2) (3, 0, 1, 12)
Figure 90 - SARIMA Prediction Model 2
By seeing the above prediction graph we could see all 3 models performs close to each other on
prediction. By considering the RMSE value and the graph we can conclude Model 1 is predicting well.
Parameters used for predicting model 1 is (2,1,2) (3, 0, 1, 12)
By reviewing ACF and PACF differences of 1 chart we have concluded the optimum value for p and q is 2.
P and Q parameters are identified based on the ACF and PACF differences of 6 chart. Both P and Q are 3
and 1 respectively
8 Build a table with all the models built along with
their corresponding parameters and the respective
RMSE values on the test data.
Models Parameters
RegressionOnTime 15.27
NaiveModel 79.72
SimpleAverageModel 53.46
2pointTrailingMovingAverage 11.53
4pointTrailingMovingAverage 14.45
6pointTrailingMovingAverage 14.57
9pointTrailingMovingAverage 14.73
Simple Exponential Smoothing Alpha=0.098750 36.80
Double Exponential Smoothing Beta=1.661039e-10 15.27
Triple Exponential Smoothing Additive Gamma=0.003467 14.25
Triple Exponential Smoothing Multiplicative Gamma=0.000072 20.16
Triple Exponential Smoothing Additive Damped Trend Damping_Trend=0.975626 26.36
Triple Exponential Smoothing Multiplicative Damped Trend Damping_Trend=0.990000 25.96
Automated ARIMA(0,1,2) q=2 37.31
Automated SARIMA(0, 1, 2)(2, 0, 2, 12) Seasonality=12 26.93
Manual ARIMA(2,1,2) q=2 36.87
Manual SARIMA Model 1:(2,1,2)(3, 0, 1, 12) Seasonality=12 22.69
Manual SARIMA Model 2:(2,1,2)(0, 0, 1, 12) Seasonality=12 33.39
Manual SARIMA Model 3:(2,1,2)(2, 0, 1, 12) Seasonality=12 28.22
Table 1 – All Models
The models we've conducted so far are listed in the table above, along with the parameters that
contributed into each model. Models are tested using the parameters, and the RMSE value is shown
9 Based on the model-building exercise, build the most
optimum model(s) on the complete data and predict
12 months into the future with appropriate
confidence intervals/bands.
Moving Average-2 point Trailing has the lowest RMSE: 11.53 and Triple Exponential Additive Smoothing
has second lowest RMSE 14.25 in test data prediction.
Moving average helps to forecast near future values but it cannot be used for actual forecast therefore
we are consider Triple Exponential Additive Smoothing is the best model for predicting future 12 months
The best model as per the RMSE value is Triple Exponential Additive Smoothing with parameters
Alpha=0.089541, Beta=0.000240 and Gamma=0.003467
Triple Exponential Additive Smoothing, also known as Holt-Winters method. The method is called "triple"
exponential because it uses three smoothing parameters, each of which is applied to a different
component of the time series:
These parameters are typically denoted by alpha (α), beta (β), and gamma (γ), respectively.
Determines the weight given to the most recent observation versus the historical average
Determines the weight given to the most recent trend versus the historical trend
Determines the weight given to the most recent seasonality versus the historical seasonality
Above mentioned alpha, beta and gamma value are used in the full data to predict future 12 months
The RMSE of full data is 16.13, for test data RMSE was 11.53
We have calculated the upper and lower confidence bands at 95% confidence level
9.2 12 months prediction
12 months into the future is predicted with appropriate confidence intervals/bands. The confidence
interval will be lies between the lower_CI and upper_ci range. With this confidence interval chart business
can plan their production to meet change in the demand.
We can easily see from the above graphic that the model predicts quite well. It is evident where the
confidence interval is for the 12 months.
Below plot helps us to understand there is a seasonality in the Rose wine sale
A remarkable increase in sales is observed in end of year that is November and December on
every year
Which may be related to the holiday season.
December is the year's highest sales peak
April month shows the lowest sale across the year
10.2 Comments on the model build
We have built 20 models to conclude which model fits very well to the Rose wine sale and predicts the
future 12 months.
Below is the top 5 models with respect to RMSE value. Moving Average – 2 point trailing and Triple
Exponential Smoothing additive are the 2 major model with different parameters performed well in the
Rose wine sale data
Moving average helps to forecast near future values but it cannot be used for actual forecast therefore
we are consider Triple Exponential Additive Smoothing is the best model for predicting future 12 months
Models Parameters
2pointTrailingMovingAverage 11.53
Triple Exponential Smoothing Additive Gamma=0.003467 14.25
4pointTrailingMovingAverage 14.45
6pointTrailingMovingAverage 14.57
9pointTrailingMovingAverage 14.73
Table 2 – Top 5 Best Models
Sales do significantly decrease year over year. Therefore, the exterior of the Rose wine container can be
changed to make it look new and fresh each year
Introducing deals during the slow sales periods will boost the company's performance. Also, it increases
sales during the busiest season
Analysis shows that wine is frequently consumed during the celebration. By partnering with an event
management company, wine sales will rise