Orbit Forecasting Model
Orbit Forecasting Model
Orbit Forecasting Model
Abstract ent time series data (Hewamalage et al., 2019), the authors
show mixed performances of various deep learning models
arXiv:2004.08492v4 [stat.CO] 22 Jan 2021
State space models is a family of models which can be Another popular approach is done through exponential
written in a general form smoothing in a reduced form
yt = ZtT αt + t
ŷt|t−1 = Z T αt−1
αt+1 = Tt αt + Rt ηt t = yt − ŷt|t−1
One big advantage is that they are modular, in the sense αt = T αt−1 + k
that independent state components can be combined by
concatenating their observation vectors Zt and arranging This was first described by Box and Jekins (Box, 1998).
the other model matrices as elements in a block diagonal Note that ŷt|t−1 can be computed recursively. Hyndman
matrix. This provides considerable flexibility for modeling et al. 2008 provide a complete review on such form. In the
trend, seasonality, regressors, and potentially other state literature, it introduces an ETS form with notation “A” and
components that may be necessary in practice. “M” which stands for additive and multiplicative respectively.
Well known methods such as Holt-Winter’s model can be
However, such form is not unique since the same model can viewed as ETS(A, A, A) or ETS(A, M, A). For notation
be expressed in multiple ways. In practice, there are various such as Ad , the subscript d denotes a damped factor is intro-
reduced forms introduced by researchers. We will go over duced. Many of those are implemented under the forecast
some of those widely used in the industry. package written in the statistical programming language R.
where gt is the piecewise linear or logistic growth curve the computation cost by vectorizing the noise generation
to model the non-periodic changes in the time series, st process.
is the seasonality term, ht is the holiday effect with irreg-
One limitation in this model is that it assumes lt > 0, ∀t. To
ular schedules, and t is the error term. On a high level,
ensure such condition is satisfied during the training period,
Prophet is framing the forecasting problem as a curve-fitting
it imposes a requirement such that yt > 0, ∀t.
exercise rather than looking explicitly at the time based
dependence of each observation within a time series. As
a computational tool/software, moreover, Prophet allows 3.2. Refined Model II - DLT
users to manually supply change points in fitting the trend For use cases with yt ∈ R, we provide an alternative -
term and set the boundaries for saturation growth, which Damped Local Trend (DLT) model.
gives great flexibility in business applications.
Such model is an extension of ETS(A, Ad , A) in Hyndman
et al. 2008, where θ is known as the damped factor. In the
3. The Orbit Package forecast process, we have
3.1. Refined Model I - LGT
Our proposed Local and Global Trend (LGT) model is an
yt = µt + st + rt + t
additive version based on the multiplicative model proposed
in Rlgt (Smyl et al., 2019). In the forecast process, we have µt = D(t) + lt−1 + θbt−1
The base Estimator class contains generic logic to handle • US and Canada rider first-trips with Uber (20 weekly
interaction with the underlying inference engine (e.g PyStan, series by city)
Pyro) along with utilities to load and save Orbit models, and
the specifics of the model are implemented in each model • US and Canada driver weekly first-trips with Uber (20
class. Figure 1 is the overall workflow of Orbit package. weekly series by city)
h
X |Ft − At |
SMAPE =
t=1
(|Ft | + |At |)/2
where Xt represents the value measured at time t and h is
the forecast horizon.
Here h is the forecast horizon which can also be considered
as the “holdouts” in a backtest process. Following what
competitions suggested, we use 13 forecast horizon and
18 forecast horizon, respectively, for M4 weekly and M3
monthly series with 1 split; for Uber datasets, we use h =
13, 3 splits with 26 incremental steps for weekly series and
h = 28, 4 splits and 14 incremental steps for daily series.
With multiple splits, we expect more robust result. The
calculation is done with the help of our backtest utilities
built in the Orbit package.
4.3. Results
We compared our proposed models, LGT and DLT, to other
popular time series models such as SARIMA (Seabold &
Perktold, 2010) and Facebook Prophet (Taylor & Letham,
2018). Both Prophet and Orbit models use Maximum A
Posterior (MAP) estimates and they are configured as similar
as possible in terms of optimization and seasonality settings.
Figure 1. Overall Design of Orbit Package. For SARIMA, we fit the (1, 1, 1) × (1, 0, 0, )S structure by
maximum likelihood estimation (MLE) where S represents
the choice of seasonality.
Orbit: Probabilistic Forecast with Exponential Smoothing
Within a dataset, the models in consideration were run on Gardner Jr, E. S. Exponential smoothing: The state of the
each time series separately, and then we report the aggre- art. Journal of forecasting, 4(1):1–28, 1985.
gated metrics. Table 1 gives the average and the standard
Gers, F. A., Schmidhuber, J., and Cummins, F. Learning to
deviation (within parentheses) of SMAPE across different
forget: Continual prediction with lstm. 1999.
models and datasets.
It shows that our models consistently deliver better accuracy Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M.,
than other candidate time series models in terms of SMAPE. and Kagal, L. Explaining explanations: An overview of
Orbit is also computationally efficient. For example, the interpretability of machine learning. In 2018 IEEE 5th
average compute time per series with full MCMC sampling International Conference on data science and advanced
and prediction from a subset of M4 weekly data is about analytics (DSAA), pp. 80–89. IEEE, 2018.
2.5 minutes and 16 ms. The run time for the same series Gunning, D. Explainable artificial intelligence (xai). De-
in Prophet is about 10 minutes for sampling and 2.4 s for fense Advanced Research Projects Agency (DARPA), nd
prediction. That’s a 4x speed up in training, and orders of Web, 2, 2017.
magnitude difference in prediction.
Code and M3/M4 data used in this benchmark study are Hewamalage, H., Bergmeir, C., and Bandara, K. Recurrent
available upon request. neural networks for time series forecasting: Current status
and future directions. arXiv preprint arXiv:1909.00590,
2019.
5. Conclusion
Huang, Z., Xu, W., and Yu, K. Bidirectional lstm-crf models
We have shown that our proposed models outperform the for sequence tagging. arXiv preprint arXiv:1508.01991,
baseline time series models consistently in terms of SMAPE 2015.
metrics. Furthermore, we also identified compute cost im-
provements when using the Orbit package. For our future Hyndman, R., Koehler, A. B., Ord, J. K., and Snyder, R. D.
work, we will continue to actively maintain the package, in- Forecasting with exponential smoothing: the state space
corporate new models, and provide new features or enhance- approach. Springer Science & Business Media, 2008.
ments (to support dual seasonality, fully Pyro integration,
etc). Hyndman, R. J. and Athanasopoulos, G. Forecasting: prin-
ciples and practice. OTexts, 2018.
References Hyndman, R. J., Athanasopoulos, G., Bergmeir, C., Cac-
Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F., eres, G., Chhay, L., O’Hara-Wild, M., Petropoulos, F.,
Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Hors- Razbash, S., Wang, E., and Yasmeen, F. forecast: Fore-
fall, P., and Goodman, N. D. Pyro: Deep universal proba- casting functions for time series and linear models. 2018.
bilistic programming. The Journal of Machine Learning Makridakis, S., Spiliotis, E., and Assimakopoulos, V. The
Research, 20(1):973–978, 2019. m4 competition: Results, findings, conclusion and way
forward. International Journal of Forecasting, 34(4):
Box, G. P, Jenkins, GM and Reinsel, GC. Time series 802–808, 2018.
analysis: forecasting and control, volume 54, pp. 176–
180. 1998. Scott, S. L. and Varian, H. R. Predicting the present with
bayesian structural time series. International Journal of
Box, G. E. and Jenkins, G. M. Some recent advances in Mathematical Modelling and Numerical Optimisation, 5
forecasting and control. Journal of the Royal Statistical (1-2):4–23, 2014.
Society. Series C (Applied Statistics), 17(2):91–109, 1968.
Seabold, S. and Perktold, J. statsmodels: Econometric and
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., statistical modeling with python. In 9th Python in Science
Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Conference, 2010. Package version 0.11.1.
Orbit: Probabilistic Forecast with Exponential Smoothing