Chronos - Learning The Language of Time Series

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari1∗, Lorenzo Stella1∗, Caner Turkmen1 , Xiyuan Zhang2†, Pedro Mer-
cado1 , Huibin Shen1 , Oleksandr Shchur1 , Syama Sundar Rangapuram1 , Sebastian Pineda
Arango3‡, Shubham Kapoor1 , Jasper Zschiegner, Danielle C. Maddix1 , Michael W. Ma-
honey4 , Kari Torkkola4 , Andrew Gordon Wilson1 , Michael Bohlke-Schneider1 , Yuyang Wang1
{ansarnd,stellalo}@amazon.com
1
Amazon Web Services, 2 UC San Diego, 3 University of Freiburg, 4 Amazon Supply Chain Optimization Technologies
arXiv:2403.07815v1 [cs.LG] 12 Mar 2024

Abstract

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time
series models. Chronos tokenizes time series values using scaling and quantization into
a fixed vocabulary and trains existing transformer-based language model architectures on
these tokenized time series via the cross-entropy loss. We pretrained Chronos models
based on the T5 family (ranging from 20M to 710M parameters) on a large collection of
publicly available datasets, complemented by a synthetic dataset that we generated via
Gaussian processes to improve generalization. In a comprehensive benchmark consisting of
42 datasets, and comprising both classical local models and deep learning methods, we show
that Chronos models: (a) significantly outperform other methods on datasets that were
part of the training corpus; and (b) have comparable and occasionally superior zero-shot
performance on new datasets, relative to methods that were trained specifically on them.
Our results demonstrate that Chronos models can leverage time series data from diverse
domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained
models as a viable tool to greatly simplify forecasting pipelines.

1 Introduction

Time series forecasting is an essential component of decision-making across various domains, including retail,
energy, finance, healthcare, climate science, among others. Traditionally, forecasting has been dominated by
statistical models such as ARIMA and ETS. These have served as reliable tools, at least until the recent shift
towards deep learning techniques (Hyndman & Athanasopoulos, 2018; Benidis et al., 2022). This shift can be
attributed to the availability of large and diverse time series data sources, and the emergence of operational
forecasting problems (Kolassa & Januschowski, 2019) that play to the strengths of deep forecasters, i.e., the
ability to extract patterns out of a large collection of time series. Despite their impressive performance, deep
forecasters still operate in the standard regime of training and prediction on the same dataset. While there
have been works dedicated to transfer learning (Ye & Dai, 2018) and domain adaptation (Jin et al., 2022)
for forecasting, the field has yet to converge on a unified, general-purpose forecasting model, a goal that
remains a beacon for time series researchers.
The emergence of large language models (LLMs) with zero-shot learning capabilities has ignited interest
in developing “foundation models” for time series. In the context of LLMs, this interest has been pursued
through two main avenues: directly prompting pretrained LLMs in natural language (Gruver et al., 2023;
Xue & Salim, 2023) and fine-tuning LLMs for time series tasks (Zhou et al., 2023a; Jin et al., 2024). However,
these methods face significant limitations, notably the need for prompt engineering or fine-tuning for each
new task, or reliance on large-scale models (GPT-3 (Brown et al., 2020), Llama 2 (Touvron et al., 2023), etc.)
that demand substantial computational resources and time for inference. Recent concurrent work (Dooley
et al., 2023; Das et al., 2023; Rasul et al., 2023; Woo et al., 2024) also explores pretraining transformer-based

∗ Equal contribution.
† Work done during an internship at Amazon Web Services.

1
models with sophisticated time-series-specific designs on a large corpus of real and (or) synthetic time series
data.
In this work, we take a step back and ask: what are the fundamental differences between a language model
that predicts the next token, and a time series forecasting model that predicts the next values? Despite the
apparent distinction — tokens from a finite dictionary versus values from an unbounded, usually continuous
domain — both endeavors fundamentally aim to model the sequential structure of the data to predict future
patterns. Shouldn’t good language models “just work” on time series? This naive question prompts us to
challenge the necessity of time-series-specific modifications, and answering it led us to develop Chronos,
a language modeling framework minimally adapted for time series forecasting. Chronos tokenizes time
series into discrete bins through simple scaling and quantization of real values. In this way, we can train
off-the-shelf language models on this “language of time series,” with no changes to the model architecture
(see Figure 1 for a high-level depiction of Chronos). Remarkably, this straightforward approach proves
to be effective and efficient, underscoring the potential for language model architectures to address a broad
range of time series problems with minimal modifications.

Time Series Tokenization Training Inference

Historical Time Series Context Tokens Context Tokens

2400 ⋯ 2142 ⋯ 2282 ⋯ 2245 ⋯ 2310 2400 ⋯ 2142 ⋯ 2282 ⋯ 2245 ⋯ 2310
Mean Scaling

Time Series Time Series


Language Model Language Model
Quantization

Probabilities

Sampled
⋯ ⋯

Tokens
Predicted

2350
2350 2350
2350
2283 2350
2350
2320
entropy

Dequantization
and Unscaling
cross

2400 ⋯ 2142 ⋯ 2282 ⋯ 2245 ⋯ 2310 2350

Next Token ID
Context Tokens Probabilistic Forecast

Figure 1: High-level depiction of Chronos. (Left) The input time series is scaled and quantized to obtain a sequence
of tokens. (Center) The tokens are fed into a language model which may either be an encoder-decoder or a decoder-
only model. The model is trained using the cross-entropy loss. (Right) During inference, we autoregressively sample
tokens from the model and map them back to numerical values. Multiple trajectories are sampled to obtain a
predictive distribution.

For the development of a useful general-purpose time series forecasting model, the scarcity of publicly
available time series datasets, both in quantity and quality, is arguably more critical than the modeling
framework. In addition to the comprehensive collection of public datasets we used to train Chronos, a
central aspect of our approach is the integration of data augmentation strategies, including TSMix and
KernelSynth. TSMix randomly samples a set of base time series from different training datasets, and
generates new time series based on a convex combination of them; KernelSynth uses Gaussian processes
to generate synthetic time series by randomly composing kernel functions. These techniques address the
inherent limitations of small training datasets in time series forecasting, enhancing model robustness and
generalization.
Our comprehensive evaluation across 42 datasets establishes Chronos as a benchmark for both in-domain
and zero-shot forecasting, surpassing both traditional models and task-specific deep learning approaches.
Notably, Chronos achieves impressive zero-shot forecasting performance out of the box, without necessi-
tating task-specific adjustments. Its accuracy, coupled with its relatively modest model size, positions it as
a preferable alternative to larger, more computationally demanding models for zero-shot forecasting appli-
cations. By its very nature as a language model operating over a fixed vocabulary, Chronos can seamlessly

2
integrate with future advancements in LLMs, making it an ideal candidate for further development as a
generalist time series model.
The rest of the paper is organized as follows. Section 2 introduces the background on time series forecasting
and language models, and discusses related work. In Section 3, we describe Chronos, our proposed language
modeling framework for time series. Section 4 discusses our data augmentation technique and synthetic time
series generation process. In Section 5, we present our main results and a rigorous analysis of different design
choices. We discuss future directions in Section 6, and conclude the paper in Section 7. Additional material
is presented in the appendices.

2 Background and Related Work

Time series forecasting concerns using historical data from a quantity of interest (typically real-valued)
to predict their future values. Formally, given a uniformly-spaced time series x1:C = [x1 , . . . , xC ], we are
interested in predicting the joint distribution of the next H steps, p(xC+1:C+H |x1:C ). In this work, we focus
on univariate forecasting, where the observations are scalars, i.e., xi ∈ R for all i.
Time series forecasting can be addressed with a variety of different methods which can be broadly categorized
into classical forecasting methods and deep learning methods. Classical forecasting methods such as ETS,
ARIMA (Hyndman et al., 2008), Theta (Assimakopoulos & Nikolopoulos, 2000) fit a separate model to each
time series independently (hence referred to as local models). In contrast, deep learning forecasting models
learn across time series in a given dataset (and are called global models). These methods leverage advances
in deep learning, such as RNNs which are used by DeepState (Rangapuram et al., 2018), DeepAR (Salinas
et al., 2020), TimeGrad (Rasul et al., 2021), and transformers which are used by TFT (Lim et al., 2021)
and PatchTST (Nie et al., 2023). Apart from the choice of architecture, these approaches differ in the
way they model the target, with some modeling the density function while others directly predicting a set
of quantiles (Wen et al., 2017; Gasthaus et al., 2019). Nevertheless, not all models produce probabilistic
forecasts: notably, models such as Informer (Zhou et al., 2021) and DLinear (Zeng et al., 2023) only produce
point forecasts.
Large language models (LLMs) have demonstrated impressive performance on various natural language
processing tasks (Brown et al., 2020; Chung et al., 2022; Touvron et al., 2023). Given a sequence of input to-
kens, w1:k = [w1 , . . . , wk ], language models aim to predict the next token, wk+1 , by modeling the conditional
distribution, p(wk+1 |w1:k ). The tokens belong to a vocabulary, V, and may be characters, subwords (Sennrich
et al., 2015), or words, depending on the tokenization scheme used.
Most modern LLMs (Brown et al., 2020; Chung et al., 2022; Touvron et al., 2023) are based on the transformer
architecture (Vaswani et al., 2017). The original transformer architecture is an encoder-decoder model
designed for machine translation. The encoder maps an input sentence of some language to a continuous
representation, and the decoder generates the translation token-by-token using the input representation
and previously decoded tokens. Many popular language models, such as BART (Lewis et al., 2019) and
T5 (Raffel et al., 2020; Chung et al., 2022), belong to this family. Another popular architecture for LLMs is
decoder-only, used in GPT-3 (Brown et al., 2020) and Llama 2 (Touvron et al., 2023), where the model only
attends to tokens up to the current token. LLMs are typically trained on a very large corpus of text with
their number of parameters ranging from millions (Raffel et al., 2020) to hundreds of billions (Chowdhery
et al., 2023). We refer the reader to Zhao et al. (2023) for a recent survey on this area of research.

LLM-based forecasters. Inspired by the success of pretrained LLMs, recent work has shown that LLMs
are general pattern recognizers (Mirchandani et al., 2023) and several methods adapting LLMs to the time
series domain have been developed. One line of work treats numerical time series data as raw text and directly
uses the pretrained LLMs with minimal or no fine tuning to forecast unseen time series. PromptCast (Xue &
Salim, 2023) leverages pretrained LLMs for forecasting by transforming the time series data into text-based
input and output pairs and reformulating the forecasting problem as a question answering task. However,
PromptCast requires dataset-specific templates for converting numerical data to text prompts. Perhaps the
most straightforward LLM-based forecasting model is LLMTime (Gruver et al., 2023), which shows clear
evidence for zero-shot forecasting ability of pretrained LLMs on a variety of benchmark time series datasets.

3
LLMTime proposes a new tokenization scheme that encodes real-valued data as a string of digits after fixing
the numerical precision and scaling the data appropriately. Once encoded as strings, forecasts are obtained
in a zero-shot setting from pretrained LLMs such as GPT-3 (Brown et al., 2020) and Llama 2 (Touvron
et al., 2023). Nevertheless, the use of such compute-hungry models hampers the scalability and practical
utility of LLMTime.
Zhou et al. (2023a) propose a unified one-fits-all model (GPT4TS) for different time series analysis tasks
by using a pretrained GPT-2 model (Radford et al., 2019) as a backbone and only fine-tune the positional
embeddings and the parameters of the layer normalization for each individual task. Instead of using tokenized
input, they directly feed the model with patch embeddings, similar to PatchTST (Nie et al., 2023). Recent
concurrent work, Time-LLM (Jin et al., 2024), repurposes LLMs for time series forecasting by aligning
embeddings of time series patches with text prototypes, and prompting the (frozen) LLM with these aligned
embeddings and a natural language prefix describing the task. Unlike Chronos, both GPT4TS and Time-
LLM require in-domain training or fine-tuning, i.e., they are fine-tuned and tested on each dataset separately.
Furthermore, the aforementioned methods are based on prompting or fine-tuning pretrained LLMs. In
contrast, Chronos trains language models from scratch on a large collection of time series, tokenized via
scaling and quantization.

Zero-shot forecasting. Zero-shot forecasting is the ability of models to generate forecasts for time series
from unseen datasets. Some early work (Orozco & Roberts, 2020; Oreshkin et al., 2021; Jin et al., 2022)
in zero-shot forecasting considers training on a single time series dataset and testing on a different dataset.
ForecastPFN (Dooley et al., 2023) tackles the problem of zero-shot forecasting by training a transformer-
based model purely on synthetic data generated according to predefined trend, seasonalities (daily, monthly,
yearly). The trained transformer model is then used to forecast real-world time series in a zero-shot setting.
In this work, we also propose a method to generate synthetic time series data from Gaussian processes
(Section 4.2); however, we use the synthetic data in combination with real data to train Chronos models,
which improves the overall zero-shot performance. Furthermore, Chronos models are probabilistic, whereas
ForecastPFN can only generate point forecasts.
Recent concurrent works (Rasul et al., 2023; Goswami et al., 2024; Das et al., 2023; Woo et al., 2024)
also develop zero-shot forecasting models by pretraining transformer-based architectures on a large corpus
of time series data. These works operate on the real values of the time series and include time-series-
specific designs such as time features, lags, patching, and real-valued distribution heads, among others. In
contrast, Chronos follows a minimalist approach by tokenizing time series values into a fixed vocabulary
and training existing language model architectures on these tokens without any time-series-specific design or
features. That is, Chronos uses a categorical distribution to model the observations, performing regression
via classification.

Other time series tasks. Similar to Zhou et al. (2023a), recent works have studied general purpose
models applicable across time series tasks including imputation, forecasting, classification and anomaly
detection. Wu et al. (2023) develop a task-generic backbone based on the Inception model (Szegedy et al.,
2015). In order to use the CNN-based Inception model, one dimensional time series is transformed into a
two dimensional image-like representation by essentially segmenting the time series based on the periodicity
and stacking the segments. SimMTM (Dong et al., 2023) is a masked pretraining framework for time series
which learns general time series representations that are then used for forecasting and classification via
fine-tuning. Although we focus on univariate time series forecasting in this work, based on its excellent
performance on unseen time series datasets, we hypothesize that Chronos learns general representations
that can potentially be deployed for tasks beyond forecasting.

3 Chronos: A Language Modeling Framework for Time Series

In this section we introduce Chronos, a framework adapting existing language model architectures and
training procedures to probabilistic time series forecasting. While both language and time series are sequen-
tial in nature, they differ in terms of their representation — natural language consists of words from a finite
vocabulary, while time series are real-valued. This distinction necessitates specific modifications to existing

4
language modeling frameworks, especially concerning tokenization, to make them applicable to time series
data. Nevertheless, since existing transformer models have excelled on language tasks, our design philosophy
involves making minimal changes to the model architectures and training procedure.

3.1 Time Series Tokenization

Consider a time series x1:C+H = [x1 , . . . , xC+H ], where the first C time steps constitute the historical
context, and the remaining H represent the forecast horizon. Language models operate on tokens from a
finite vocabulary, so using them for time series data requires mapping the observations xi ∈ R to a finite set
of tokens. To this end, we first scale and then quantize observations into a fixed number of bins.

Scaling. The scale of time series can differ significantly even within a single dataset. This poses optimiza-
tion challenges for deep learning models. Therefore, individual time series are normalized to facilitate better
optimization. In the case of Chronos, the goal of normalization is to map the time series values into a
suitable range for quantization. A common normalization technique involves applying an affine transforma-
tion to the time series, i.e., x̃i = (xi − m)/s. Several popular normalization schemes, such as mean scaling,
standard scaling and min-max scaling, can be obtained by appropriately choosing m and s. We opt for mean
scaling, a method that has proven effective in deep learning models commonly used for practical time series
applications (Salinas et al., 2020), but other approaches are viable and only require minimal changes. Mean
scaling normalizes individual entries of the time series by thePCmean of the absolute values in the historical
context. Specifically, this involves setting m = 0 and s = C1 i=1 |xi |.

Quantization. The scaled time series x̃1:C+H = [x̃1 , . . . , x̃C , . . . , x̃C+H ], is still real-valued and cannot
be processed directly by language models. To convert these real values into discrete tokens, we employ
quantization. Formally, we select B bin centers c1 < . . . < cB on the real line, and B − 1 edges bi separating
them, ci < bi < ci+1 , for i ∈ {1, . . . , B − 1}. The quantization function q : R → {1, 2, . . . , B}, and
dequantization d : {1, 2, . . . , B} → R, are then defined as

1 if − ∞ ≤ x < b1 ,



2 if b1 ≤ x < b2 ,


q(x) = . and d(j) = cj , (1)

 ..

B if bB−1 ≤ x < ∞,

respectively. The positioning of bin centers and edges can either be data-dependent or uniform (Rabanser
et al., 2020). Quantile binning, a type of data-dependent binning, exploits the cumulative distribution func-
tion (CDF) of the training datapoints to construct bins such that approximately equal number of datapoints
are assigned to each bin. In contrast, uniform binning selects bin centers uniformly within some interval
[l, r]. Since the distribution of values for unseen downstream datasets can differ significantly from the train-
ing distribution, we opt for uniform binning in our experiments, but other quantization techniques can be
used. We refer the reader to Rabanser et al. (2020) for a detailed discussion on quantization schemes for
time series. A potential limitation of this approach is that the prediction range is restricted between [c1 , cB ],
making it theoretically infeasible to model time series with a strong trend. We explore this further in a
practical setting in Section 5.7.
Apart from the time series tokens {1, 2, . . . , B}, we include two special tokens, commonly used in language
models, into the time series vocabulary, Vts : PAD and EOS. The PAD token is used to pad time series of different
lengths to a fixed length for batch construction and to replace missing values. The EOS token is appended
to the quantized and padded time series to denote the end of the sequence. While the use of an EOS token
is not strictly necessary in the case of time series, it makes training and inference using popular language
modeling libraries convenient. The sequences of tokens from Vts can readily be processed by language models
(both encoder-decoder and decoder only models), to train them as usual. A common approach in time series
modeling is to incorporate time and frequency information, through features such as day-of-week, week-
of-year, and so on. Perhaps counter-intuitively, in Chronos, we ignore time and frequency information,
treating the “time series” simply as a sequence.

5
We primarily focus on the variants of the encoder-decoder T5 model (Raffel et al., 2020). Additionally, we
conduct an experiment with the GPT-2 (Radford et al., 2019) model to demonstrate that our approach
can be straightforwardly extended to decoder-only models. No modifications are required to the language
model architecture, except adjusting the vocabulary size to |Vts |, which depends on the number of bins used
for quantization and may be different from the vocabulary size of the original language model. Concretely,
adjusting the vocabulary size entails truncating (or extending) the input and output embedding layers of
the language model.

3.2 Objective Function

As typical in language models, we use the categorical distribution over the elements of Vts as the output
distribution, p(zC+h+1 |z1:C+h ) where z1:C+h is the tokenized time series. Chronos is trained to minimize
the cross entropy between the distribution of the quantized ground truth label and the predicted distribution.
Formally, the loss function for a single tokenized time series (also accounting for EOS tokens) is given by,

X |V
H+1 Xts |

ℓ(θ) = − 1(zC+h+1 =i) log pθ (zC+h+1 = i|z1:C+h ), (2)


h=1 i=1

where pθ (zC+h+1 = i|z1:C+h ) denotes the categorical distribution predicted by the model parameterized by
θ. In practice, the loss is averaged over a batch of time series during training.
Note that the categorical cross entropy loss (Eq. 2) is not a distance-aware objective function, i.e., it does not
explicitly recognize that bin i is closer to bin i + 1 than to i + 2. Instead, the model is expected to associate
nearby bins together, based on the distribution of bin indices in the training dataset. In other words,
Chronos performs regression via classification (Torgo & Gama, 1997). This is unlike typical probabilistic
time series forecasting models, which either use parametric continuous distributions such as Gaussian and
Student’s-t (Salinas et al., 2020) or perform quantile regression (Wen et al., 2017; Lim et al., 2021).
Opting for a categorical output distribution offers two key advantages. Firstly, it requires no modification
to the language model architecture or training objective, enabling the use of popular language modeling
libraries and the utilities they provide out of the box (Wolf et al., 2020). Secondly, it imposes no restrictions
on the structure of the output distribution, allowing the model to learn arbitrary distributions including
multimodal ones. This flexibility proves especially valuable for a pretrained model, as time series datasets
from diverse domains may follow distinct output distribution patterns.

3.3 Forecasting

Chronos models are probabilistic by design and multiple realizations of the future can be obtained by
autoregressively sampling from the predicted distribution, pθ (zC+h+1 |z1:C+h ), for h ∈ {1, 2, . . . , H}. These
sample paths come in the form of token IDs that need to be mapped back to real values and then unscaled
to obtain the actual forecast. The dequantization function d from Eq. (1) maps the predicted tokens to real
values: these are then unscaled by applying the inverse scaling transformation, which in the case of mean
scaling involves multiplying the values by the scale s.

4 Data Augmentation

The quality and quantity of public time series data pales in comparison to the natural language processing
(NLP) domain, which benefits from ample high-quality text datasets such as WikiText-103 (Merity et al.,
2016), C4 (Raffel et al., 2020), and The Pile (Gao et al., 2020). This poses challenges for training models
intended for zero-shot forecasting, which rely on large-scale time series data with diverse patterns. To address
this issue, we propose enhancing the diversity of training data by generating mixup augmentations from real
datasets and supplementing training with synthetic data.

6
4.1 TSMix: Time Series Mixup

Mixup (Zhang et al., 2017) is a data augmentation scheme pro-


posed in the context of image classification. It generates convex Original Time Series TSMix Augmentations
combinations of random image pairs and their labels from the k=1
<latexit sha1_base64="eOoS0hLGhrxzZRWJQKIsp1ljvbc=">AAAyL3icjVvbbtzIEdVubhvltpsAeckLEdnY3UAWPLaR5CXA6i5ZI2mk0d3jNTicGg4t3szuoSQPZv8gr8l35GuCvAR5zV+kupvNqqYobQRYYp3TXezLqa4aDj3M40jI58//9cmnP/jhj378k89+uvizn//il7/6/Itfn4lsWgRwGmRxVlwMfQFxlMKpjGQMF3kBfjKM4Xx4va748xIKEWXpibzL4W3ih2k0jgJfItS//kvn3edLz1ee6x/v/kWnulhaqH56777o/HYwyoJpAqkMYl+IN53nuXw78wsZBTHMFwdTAbkfXPshvMHL1E9AvJ3psc69p4iMvHFW4L9UehrlPWZ+IsRdMsSWiS8noskpsI17M5XjP7+dRWk+lZAG5kbjaezJzFMT90ZRAYGM7/DCD4oIx+oFE7/wA4nLs7i4+FT9eAeb597+6smOt7G5tXuwe7J7eND3NLXYNpJl/KvmIZaHyRx9ePt+ce0JvBGurvCysRf4ublWUy5gDEURpaEa1SgqI2GbjaNwWgDOKIWbIEsSPx3NBgjGMJbz2WwAifdVF6+/ns/vtQlwI6Cwrda11dauiMJJ7exYGW2tZJbbNidZ3tZimEmZJbbRmrbutavm7dtm/kMthrbF8KEWgW0RPNRiZFuMVAvchh2cXaxm6Pketle7DmMMkZGHa5O4PvBagfM3nbfoZTj2ljrKCXrZ0ptidg01BctenN1A8SzAgFtZHKBLvawwXurMzAZ+N0Brph20dcfhRtKPV7wtFIOQGDFq74XaMeSNxy3rcavpUdPyJrP3XHpR3VV4tpGHM6qMF7bHh6k/oi5LL5de3eu2XPexVy+5q1d6On2j6keXA5VvBl+FgLMeLQ7sgpjefdu739L72PbSEX2T1VG2Ui+MubvQK1PH4ANL03Q4KQCaLpm/pZf3PdKqMd8v7/v2Uw9wE1TnliWDD2bOmx9WvvvKuv76cSdR6uU+UnICIhLMT46OvlKe/k9H0zyHwlOjMU4268GYFs4OrHqFf0O713D27Nkzv8yikTcV6oCLxl6eCRFhIjKu89jHAKz8P7ixvjqUc4zHlpVSjOletXlYHw9MsnK0Xjta/15HOOc0BH2Sm7bVcmu4HhEKztLW1bNnD4oNR+fHYYZJaJK0zBM5M7q60aMTZa7uzXTVulptcWXDxt4PJ1H7evxIOXE6rX5vp3uLiuqV1cyZ+hRqhquuHtsU07+p3l7dv+f2tzOtb4CjVtcPDrgSHESxEmusLjAtYAN1Vfkbx1lWaFpfGV5fVg2QGiazTjNnyQIDYT4bqPoh8OPZRrNB6cfRiDd4Z66LZGao+T2XIGR7B83M6xlBLlSmzEUUZ2mV5Y7RRZZ4pV9EPkar1TdIf6Y838o0KxL0+mSA0JO5Xc6iQfvEDF1mSEzgMgExI5cZEQMuA8SMXWZMTOgyITETl5kQE7lMRMx7l3lPzLXLXBMTu0w81zIuEi8SGLFYoo/u1GFndnDZez8V0htl6ZfSU/UyyvFOnTzOxnhJ5Tt1fad018xlMmJyl8mJ+eAyH4gpXKYgRriMIEa6jCRm6jJTYkqXKYm5cZkbYm5d5paYO5e5I+ajy3ycm2LRBgDm96w+3ssqSGYmlIZjFjb1uDH/6iixLbTNeMZxeEgwi40yIJgFRjkimEVFCQSzkCjHBLN4KEOCWTCUE4JZJJRTglkYlO8JZjFQXhPMAqCMCY4ZnBCcMJgtNF/hjGAm5jInmCm5/EAwk3FZEMw0XAqCBd9UgmX7mnDplgQz3ZY3BDPRlrcEM8WWdwQzuZYfCbZa3YxBfe7WnxmLFt2CEV3ruQxGea0nMxj5tZ7NYDTYejqDEWLr+QxGja0nNBhJtp7RYHTZekoj9+A5DUahrSc1GJm2ntVgtNo8rS2XuFzCuQdPYjDSbT2Lwei39TQGI+LW8xiMkltPZDBybj2TwWi69VQGI+zWcxmMultPZjASbz2bwei89XQGI/bW8xmM4h8+oTEWiiioK5RkleJjlcImWSN4jcHrBK8zeIPgDQZvErzJ4C2Ctxi8TfA2g3cI3mHwLsG7DH5N8GsG7xG8x+AuwV0G7xO8z+ADgg8YfEjwIYN7BPcYfETwEYOPCT5mcJ/gPoNPCD5h8CnBpww+I/iMwecEnzP4guALBl8SfMngK4KvHj5eXdGBUR3T6CrTr5Ye49Y4t+5y65zbcLkNzm263Cbntlxui3PbLrfNuR2X2+Hcrsvtcu61y73m3J7L7XGu63Jdzu273D7nDlzugHOHLnfIuZ7L9Th35HJHnDt2uWPO9V2uz7kTlzvh3KnLnXLuzOXOOHfucuecu3C5C85dutwl565czsr+jJcQ5UfQnyPws+vzum+ZpTCzn2ctlkwNNEgoadQ1scLderisYIYMDUJ1iK5CEKHqQ9ceiFDNUVYjoUpD1xmIUH2hqwtEqKrQNQUiVEvoSgIRqiB0/YAI1Q26akCEqgVdKyASs3UwCFUGui5AJGXrZxCqAnQNgAjlfp35EaGMr/M9IpTndZZHRLAFNwjl9LLaFrYppUEof+vsjQhlbZ2zEaFcrTM1IpShdX5GpK0adcvQ0o/zidpv/bdWYDmsxKF1YUH6qEVPJioq9pPhSPUwF0RkCYQK138J1pJUcrQAOkQEfxMkojBRXfVfgq1wK9HWE5nN+PhnSqzWQrEGZKFQR2xSMyVQa6FAx2ShOEOyUJgTsnC4bKwoyPdkoRiv2drMlAjrmc+UAK2Fi8lWEcWXsSWZKdFZC0X3gSwUXMFWaqaEVi/QTInMWrjQbJlRYCVZKK4bslBYt2ShqO7IQkF9nFffnGGevTW4zrGoM8qtOrMiQhlV51NEKI/qLIoIZU+dOxGhnKkzJiKUKXWeRITyo86OiFBW1DkREcqFOhMiQhlQ5z9EKO/prIcIZTud6xChHKczHCKU2XReQ4Tymc5miFAW0zkMEcpdOnMhQhlL5ytEKE/pLIUIZSedmxChnKQzEiKUiXQeQoTyj84+iFDW0TkHEco1OtMgcsV2kPLCkKeFpDepDuIBXrHVs6GvmG4V/vXkqhhWXN/EsVbRCaRCfaG8AUHsF4CimqyqEwjvaIo9MY7Uo1JIg2wUpSE686exQsS4vk7mM6Ge8vZBPuRgmMWj73MzvJ1jEDaf1KZCf9No8mblTz+lrqYmTX2ZCqZ+uWYx0r9ctxhFgNywGMWA3LQYRYHcshjFgdy2GEWC3LEYxYLctRhFg3xtMYoHuWcxigjZtRjFhNy3GEWFPLAYxYU8tBhFhuxZjGJDHlmMokMeW4ziQ/YtRhEiTyxGMSJPLUZRIs8sRnEizy1GkSIvLEaxIi8tRtEiryxmKjIU8nbh5xPDhvZzbuB83AjXGEy6CNcZTNIINxhM6gg3GUwCCbcYTBoJtxlMMgl3GExKCXcZTGIJXzOY9BLuMZgkE3YZTKoJ9xlMwgkPGEzaCQ8ZTPIJewwmBYVHDCYRhccMJh2FfQaTlMITBpOawlMGk6DCMwaTpsJzBpOswgsGk7LCSwaTuMIrBtuKH4+2qlQT9VOUIROXWCOUtCXWCSVpiQ1CtbKeehv6m4ypAM/3BEgPbx3DyNtc9oYQ+AqXk0h4N9k0HiGEFnhCf++BteS08NQbQFmMjtRbM3CbY22pv8y1X8xv0R1JnWKbUBKn2CGUtCl2CSVpiteEkjLFHqEkTNEllHQp9gklWYoDQkmV4pBQEqXoEUqaFEeEkiTFMaGkSNEnlAQpTgglPYpTQkmO4oxQUqM4J5TEKC4IJS2KS0JJiuKK0PqRS4p1H+iPEL552FIVgUAVQNct/lV5uEoWSnWNLJToOlkozQ2y8LDbJAtFtEUWimebLBTNDlkoll2yUCSvyUJx7JGFouiShWLYJwtFcEAWbv4hWbjpPbJws4/Iwk0+Jgs3t08WbuoJWbiZp2ThJp6RhZt3ThZu2gVZuFmXZOEmXbH7VZVWVWWpLQO+ZdJUXHikqPjVL/VhEBt02buJ5CSbSg/LHe8GU1oOhVsQAVVETjVU3V7WGtAN7xWCoMslaNRLoAsmaFRMoEsmaNRMoIsmaFRNoMsmaNRNoAsnaFROoEsnaNROoIsnaFRPoMsnaNRPoAsoaFRQoEsoaNRQoIsoaFRRoMsoaNRRoAspaFRSoEspaNRSoIspaFRToMspaNRToAsqaFRUoEsqaNRUoIsqaFRVoMsqaNRVoAsraFRWoEsraNRWoIsraFRXoMsraNRXoAssYBUWflLAlCOLKXjTdARFfKdeWhr50vdCSKHAbKPsSKDSh1OVelzZ5qrpfJa/mw2KZKYNnfiUV0jyqIgw5Tn96zcQh3c63enXQNRNMD82fNs3RCa+xE/q7i2clj3esjdvG0ySjSB+bCK6QT0TY927T9Wo91ijXEbxCKqWA23Uo6974DEhs2DiC/X+rT+Vmf4EBYUzwsZrsLlpU4+x6nJ/ACNw2hmzpV2BBB46tp0xUQuBfobmNo79PPYDmNdv1HQrYO499aprd3nd/pvzOuNtNgfSFeylnW6TPZ7z3N44NZOcrXGDjIu5fe7mEgWE8/pJWpMKJM1RWdE4gqLpWmRjmfi31NICzXaYLDL9EpN5yHbfSx5P1ew/qicBLrvXnfM3mPa69zbwzC9oBMpo+pf4xy9w74uMtezf24D1rCRaGUqg51k8LvxEPZCa3GQFFqjCvxPek+63L56ot3f0u+vT1LzLKnLcf6HfHnsygDhmbewD0afeGiZADPlU/brDeIdEvcWmqmDjlLVWL6Jm01DnTF0URxKWtXuReaMMlLub6DrKYRT5K40XmbMiidXD+/ms++3zeQuZpaC4Thsnb3S/F21crpi8hdFa6H47iNKxvGuGjnlFFXe556tg6QOetcIPQb2+mmZVQS/hdsVbn2RCLU+mCsBg4m3gZ98UvhTeMMuuVxadxzmHuTqds+IPqPEi1APAv4NldfVYQ3VOmoZ41e5SqxWb6d8PtDhBQZ2oN/xikAN/iHEWZzfDAvzrxXefL3Wa/2/i/sXZi5XOH1deHb1a+mat+j8Vny38buH3C18tdBb+tPDNws5Cb+F0IVgIF/668LeFv3f+0fln59+d/5imn35S9fnNgvPT+e//AGOBZqY=</latexit>

training dataset, which alleviates issues such as memorization =1


<latexit sha1_base64="Wv4uZfgzqDeooLYli1YJl1p2tL8=">AAAyN3icjVvLctvIFdVMXhPlNZNUZZMNKrJrZlKyyrRdSTapGr0li5Ko98P0uEDwEoSFl9FNSDKL8xfZJt+RT8kqu1S2+YPc7kbj3gYhTVRlCX1O46LRfW7fQxAe5HEk5PPn//zk0x/88Ec//slnP1382c9/8ctfff7Fr89FNikCOAuyOCsuB76AOErhTEYyhsu8AD8ZxHAxuFlX/EUJhYiy9FTe5/A28cM0GkWBLxF604+x69B/1/lL593nS89Xnusfb/6gUx0sLVQ/vXdfdH7bH2bBJIFUBrEvxJvO81y+nfqFjIIYZov9iYDcD278EN7gYeonIN5O9Zhn3lNEht4oK/BfKj2N8jOmfiLEfTLAnokvx6LJKbCNezORoz+/nUZpPpGQBuZCo0nsycxTE+ANowICGd/jgR8UEY7VC8Z+4QcSp2lxcfGp+vEONi+8/dXTHW9jc2v3YPd09/DgxNPUYttIlvGvug+xPEhmGMPb94sbT+CFcJaFl428wM/NsbrlAkZQFFEaqlENozISttsoCicF4B2lcBtkSeKnw2kfwRhGcjad9iHxvuri8dez2VyfABcCCttrXbfa+hVROK6DHatGWy+Z5bbPaZa39RhkUmaJ7bSmW3P9qvv2bTf/oR4D22PwUI/A9gge6jG0PYaqBy7DDt5drO7Q8z3sr1YdRpgqQw/nJnFj4LECZ286bzHKYOQtdVQQjLKlF8WsGmoKlr04u4XiWYCJt7LYx5B6WmG01JmaBfyuj62pDtB2Og43kn684m2hGITEjFFrL9SKIW8ibtmIW82Impa3mb3m0ovqqsKznTy8o6rxwp7xYeIP6ZSll0uv5k5brs+xRy95qFf6dk6Mqh+dDlS+GXyVAs58tASwE2LOPrFnn7ScfWzP0hl9m9VZtlJPjLm60DNT5+ADU9MMOC4AmiFZvKWX8xFp1ljsl/Ox/dQDXAR1csuUwQdzz5sfVr77yob++vEgUerlPlJyDCISLE6Ogb5Skf7PQJM8h8JTozFBNuvBmB7OCqx6hX9Lq9cI9uzZM7/MoqE3EWqDi0ZengkRYUEyofPYxwSs4j+4sL7alHPMx5aZUow5verzsD4euMkq0HodaP17A+E9pyHondz0raZbw/WIUHCWtqGePXtQbDg6Pw4zLELjpOU+kTOjqzs9eqMs1NydrtpQqy2hbNrY6+FN1LEe31JOnZNWv/ekuUlF9crqzpn6FGqGq44eWxRzflO9vfr8nnu+vdP6AjhqdfzggCvBQRQrscbqAMsCdlBHVbxRnGWFpvWR4fVh1QGpQTLtNGuWLDARZtO+8g+BH083mh1KP46GvMM7c1wkU0PN5kKCkO0naGZW3xHkQlXKXERxllZV7hhDZIlX+kXkY7ZafYP0pyrynUyzIsGoT/oIPZnZ6SwatE/MwGUGxAQuExAzdJkhMeAyQMzIZUbEhC4TEjN2mTExkctExLx3mffE3LjMDTGxy8QzLeMi8SKBGYtWfXivNjuzgsve+4mQ3jBLv5Se8ssox3u18zgL4yVV7NSNndJVM5fJiMldJifmg8t8IKZwmYIY4TKCGOkykpiJy0yIKV2mJObWZW6JuXOZO2LuXeaemI8u83FmzKJNAKzvWb29l1WSTE0qDUYsbepxY/3VWWJ76DbjGcfhAcEsN8qAYJYY5ZBglhUlEMxSohwRzPKhDAlmyVCOCWaZUE4IZmlQvieY5UB5QzBLgDImOGZwQnDCYDbRfIYzgpmYy5xgpuTyA8FMxmVBMNNwKQgWfFEJlu1zwqVbEsx0W94SzERb3hHMFFveE8zkWn4k2Gp1Mwb1uVt/ZixadAtGdK37Mhjlte7MYOTXujeD0WDr7gxGiK37Mxg1tu7QYCTZukeD0WXrLo3cg/s0GIW27tRgZNq6V4PRanO3tlzicgnnHtyJwUi3dS8Go9/W3RiMiFv3YzBKbt2Rwci5dU8Go+nWXRmMsFv3ZTDqbt2ZwUi8dW8Go/PW3RmM2Fv3ZzCKf3iHxlwooqB2KMkq5ccqpU2yRvAag9cJXmfwBsEbDN4keJPBWwRvMXib4G0G7xC8w+BdgncZ/Jrg1wzeI3iPwV2CuwzeJ3ifwQcEHzD4kOBDBvcI7jH4iOAjBh8TfMzgE4JPGHxK8CmDzwg+Y/A5wecMviD4gsGXBF8y+IrgKwZfE3z98Pbqig6M6phGV5l+tfQYt8a5dZdb59yGy21wbtPlNjm35XJbnNt2uW3O7bjcDud2XW6Xc69d7jXn9lxuj3Ndl+tybt/l9jl34HIHnDt0uUPO9Vyux7kjlzvi3LHLHXPuxOVOOHfqcqecO3O5M86du9w55y5c7oJzly53ybkrl7vi3LXLWdmfcwtRfgT9OQI/uz6vzy2zFKb286zFkomB+gkVjdoTK9z1w2UFM2RgEPIh2oUgQu5Dew9EyHOU1UjIaWifgQj5C+0uECFXoT0FIuQltJNAhByE9g+IkG/QrgERcgvaKyASs3kwCDkD7QsQSdn8GYRcgPYAiFDt15UfEar4ut4jQnVeV3lEBJtwg1BNL6tlYYtSGoTqt67eiFDV1jUbEarVulIjQhVa12dE2tyoa0NLP87Har3131qB5aASh9aFBemjFj2ZqCjzPRUy5oCILIFQ4fovwVqSSo4WwICI4G+CRBQm6lT9l2Ar3Eq09Y1Mp3z8UyVW20KxBtRCoQ7ZTU2VQG0LBTqiFoozpBYKc0wtHC4bKwryPbVQjDdsbqZKhPWdT5UAbQsnk80iii9jUzJVorMtFN0HaqHgCjZTUyW0eoKmSmS2hRPNphkFVlILxXVLLRTWHbVQVPfUQkF9nFXfnGGdvTO4rrGoM6qturIiQhVV11NEqI7qKooIVU9dOxGhmqkrJiJUKXWdRITqo66OiFBV1DUREaqFuhIiQhVQ1z9EqO7pqocIVTtd6xChGqcrHCJU2XRdQ4Tqma5miFAV0zUMEapdunIhQhVL1ytEqE7pKoUIVSddmxChmqQrEiJUiXQdQoTqj64+iFDV0TUHEao1utIgcs1WkOrCgJeFpDeuNuI+HrHZs6mvmG6V/vXNVTmsuBOTx1pFp5AK9YXyBgSxXwCKaryqdiC8ojF7YhSpR6WQBtkwSkMM5k9ihYhRfZzMpkI95T0B+VCAQRYPvy/M4G6GSdh8UpsK/U2jqZtVPP2Uuro1afxlKpj65ZrFSP9y3WKUAXLDYpQDctNilAVyy2KUB3LbYpQJcsdilAty12KUDfK1xSgf5J7FKCNk12KUE3LfYpQV8sBilBfy0GKUGbJnMcoNeWQxyg55bDHKD3liMcoQeWoxyhF5ZjHKEnluMcoTeWExyhR5aTHKFXllMcoWeW0x48hQyNuFn48NG9rPuYHzcSNcYzDpIlxnMEkj3GAwqSPcZDAJJNxiMGkk3GYwySTcYTApJdxlMIklfM1g0ku4x2CSTNhlMKkm3GcwCSc8YDBpJzxkMMkn7DGYFBQeMZhEFB4zmHQUnjCYpBSeMpjUFJ4xmAQVnjOYNBVeMJhkFV4ymJQVXjGYxBVeM9g6ftzaKqsm6qcoAyYusUYoaUusE0rSEhuEamU99Tb0NxkTAZ7vCZAeXjqGobe57A0g8BUux5HwbrNJPEQIW+AJ/b0HeslJ4ak3gLIYA6m3ZuAuR2+pv8y1X8xv0RVJnWKbUBKn2CGUtCl2CSVpiteEkjLFHqEkTNEllHQp9gklWYoDQkmV4pBQEqXoEUqaFEeEkiTFMaGkSHFCKAlSnBJKehRnhJIcxTmhpEZxQSiJUVwSSloUV4SSFMU1ofUjlxR9H+iPEL552FKZQCAH0HXNv7KHq9RCqa5RCyW6Ti2U5ga1cLPbpBaKaItaKJ5taqFodqiFYtmlForkNbVQHHvUQlF0qYVi2KcWiuCAWrj4h9TCRe9RCxf7iFq4yMfUwsU9oRYu6im1cDHPqIWLeE4tXLwLauGiXVILF+uKWrhI1+x6ldOqXJZaMuBLJo3jwi1F5a9+qQ+T2KDL3m0kx9lEemh3vFssaTkUriECckSOG6ouL2sN6I5zRhC0XYKGXwJtmKDhmEBbJmh4JtCmCRquCbRtgoZvAm2coOGcQFsnaHgn0OYJGu4JtH2Chn8CbaCg4aBAWyhoeCjQJgoaLgq0jYKGjwJtpKDhpEBbKWh4KdBmChpuCrSdgoafAm2ooOGoQFsqaHgq0KYKGq4KtK2Chq8Cbayg4axAWytoeCvQ5goa7gq0vYKGvwJtsIA5LPykgCVHFhPwJukQivhevbQ09KXvhZBCgdVGtSOBSh9MVOlxZZurrrNp/m7aL5KpbujCp6JCkkdFhCXPOb9+A3Fwr8udfg1EXQTrYyO2fUNk7Ev8pO5ewunZ4z17s7bBJNkQ4sduRHeo78S05q5Tdeo91imXUTyEqmdfN+rR12fgNiGzYOwL9f6tP5GZ/gQFhTPCxmuwuelTj7E6ZX4AQ3D6mWZLvwIJ3HRsP9NELQT6GZrbOfbz2A9gVr9R062AmffUq47d6XXP35zVFW+zOZCuYC/tdJvs8YzX9saumeRsjhtkXMzsczeXKCCc1U/SmlQg6R5VKxpFUDRDi2wkE/+Oelqg2Q+LRaZfYjIP2eaj5PFE3f1H9STAZfe6M/4G0153bgHP/YJGoBrN+BL/+AWufZGxnidzC7CelUSrhhLoRRaPCj9RD6TGt1mBBlX498J70v32xRP19o5+d32SmndZRY7rL/TbY0/6EMesj30g+tRbwwKIKZ+qX/eY75Cot9iUCzZBWW/1Imo2CXXN1KY4krCsw4vMG2agwt1GN1EOw8hfabzInBVJrB7ez6bdb5/PWsgsBcV12jh5q8970cblislbGK2F7rf9KB3J+2bqmFdUcZV7vkqWE8C9VvghqNdX06wy9BLuVrz1cSbU9GTKAAZjbwM/+6bwpfAGWXazsug8zjnM1e6cFX9AjRehHgD+7S+ro8c6qn3SdMSj9pBardhN/36gxykK6lS94ReD7PsDzLM4ux0U4N8svvt8qdP8fxPzB+cvVjp/XHl19Grpm7Xq/1R8tvC7hd8vfLXQWfjTwjcLOwu9hbOFYCFb+OvC3xb+3vlH51+df3f+Y7p++kl1zm8WnJ/Of/8HrvVp2A==</latexit>

and overfitting in deep learning models. Existing works (Car- <latexit

1
sha1_base64="Cm4OIrFr02Jt0L/KmjA/QmLUbz4=">AAAyOXicjVvLbtzIFdXkOVFeMwmQTTZEZGNmAllQ28YkmwCjt2S1pNb74fYYbPZtNi2+xKqmJDd6fiPb5DvyJVlmF2SbH8itKhbvLYrSRIAl1jnFy2LVuXVPs+lBHkdCLi//85Mf/PBHP/7JTz/92fzPf/HLX/36s89/cyaySRHAaZDFWXEx8AXEUQqnMpIxXOQF+MkghvPB9Zriz0soRJSlJ/I+h3eJH6bRKAp8idC7foxdh/77zl+Wl75+/9nC8tKy/vEeHnSqg4W56qf3/vPO7/rDLJgkkMog9oV421nO5bupX8goiGE2358IyP3g2g/hLR6mfgLi3VSPeuY9R2TojbIC/6XS0yg/Y+onQtwnA+yZ+HIsmpwC27i3Ezn687tplOYTCWlgLjSaxJ7MPDUF3jAqIJDxPR74QRHhWL1g7Bd+IHGi5ufnn6sfb3/j3NtbOdn21jc2d/Z3TnYO9o89Tc23jWQR/6r7EIuDZIYxvD2/uPYEXgjnWXjZyAv83ByrWy5gBEURpaEa1TAqI2G7jaJwUgDeUQq3QZYkfjqc9hGMYSRn02kfEu/LLh5/NZs96BPgQkBhe63pVlu/IgrHdbAj1WjrJbPc9jnJ8rYeg0zKLLGdVnXrQb/qvn3bzX+sx8D2GDzWI7A9gsd6DG2PoeqBy7CNdxerO/R8D/urVYcRJsvQw7lJ3Bh4rMDZ2847jDIYeQsdFQSjbOpFMauGmoJFL85uoXgRYOotzfcxpJ5WGC10pmYBv+tja6oDtJ2Ow42kHy95mygGITFj1NoLtWLIm4ibNuJmM6Km5W1mr7nwsrqq8GwnD++oary0Z9xM/CGdsvBq4fWD0xbrc+zRKx7qtb6dY6PqJ6cDlW8GX6WAMx8tAeyEmLOP7dnHLWcf2bN0Rt9mdZYt1RNjri70zNQ5+MjUNAOOC4BmSBZv4dXDiDRrLParh7H91ANcBHVyy5TBjbnnjZul7760ob96OkiUermPlByDiASLk2OgL1Wk/zPQJM+h8NRoTJCNejCmh7MCK17h39LqNYK9ePHCL7No6E2E2uCikZdnQkRYkkzoPPYxAav4jy6srzblHPOxZaYUY06v+jyuj0dusgq0Vgda+95AeM9pCHonN32r6dZwPSIUnKVtqBcvHhUbjs6PwwyL0DhpuU/kzOjqTk/eKAv14E5XbKiVllA2bez18CbqWE9vKSfOSSvfe9KDSUX1yurOmfoUaoarjp5aFHN+U729+vyee7690/oCOGp1/OiAK8FBFCuxxuoAywJ2UEdVvFGcZYWm9ZHh9WHVAalBMu00a5YsMBFm077yD4EfT9ebHUo/joa8w3tzXCRTQ80ehAQh20/QzKy+I8iFqpS5iOIsrarcEYbIEq/0i8jHbLX6BulPVeQ7mWZFglGf9RF6NrPTWTRon5iBywyICVwmIGboMkNiwGWAmJHLjIgJXSYkZuwyY2Iil4mI+eAyH4i5dplrYmKXiWdaxkXiRQIzFs368F5tdmYFF70PEyG9YZZ+IT3ll1GO92rncRbGS6rYqRs7patmLpMRk7tMTsyNy9wQU7hMQYxwGUGMdBlJzMRlJsSULlMSc+syt8TcucwdMfcuc0/MR5f5ODNm0SYA1ves3t7LKkmmJpUGI5Y29bix/uossT10m/GM4/CAYJYbZUAwS4xySDDLihIIZilRjghm+VCGBLNkKMcEs0woJwSzNCg/EMxyoLwmmCVAGRMcMzghOGEwm2g+wxnBTMxlTjBTcnlDMJNxWRDMNFwKggVfVIJl+5xw6ZYEM92WtwQz0ZZ3BDPFlvcEM7mWHwm2Wt2IQX3u1p8ZixbdghFd674MRnmtOzMY+bXuzWA02Lo7gxFi6/4MRo2tOzQYSbbu0WB02bpLI/foPg1Goa07NRiZtu7VYLTa3K0tl7hcwrlHd2Iw0m3di8Hot3U3BiPi1v0YjJJbd2Qwcm7dk8FounVXBiPs1n0ZjLpbd2YwEm/dm8HovHV3BiP21v0ZjOIf36ExF4ooqB1KskL5sUJpk6wSvMrgNYLXGLxO8DqDNwjeYPAmwZsM3iJ4i8HbBG8zeIfgHQa/IfgNg3cJ3mVwl+Aug/cI3mPwPsH7DD4g+IDBPYJ7DD4k+JDBRwQfMfiY4GMGnxB8wuBTgk8ZfEbwGYPPCT5n8AXBFwy+JPiSwVcEXz2+vbqiA6M6ptEVpl8tPcatcm7N5dY4t+5y65zbcLkNzm263Cbntlxui3PbLrfNuR2X2+HcG5d7w7ldl9vlXNflupzbc7k9zu273D7nDlzugHM9l+tx7tDlDjl35HJHnDt2uWPOnbjcCedOXe6Uc2cud8a5c5c759yFy11w7tLlLjl35XJW9mfcQpQfQX+OwM+uy/W5ZZbC1H6etVgyMVA/oaJRe2KFu364rGCGDAxCPkS7EETIfWjvgQh5jrIaCTkN7TMQIX+h3QUi5Cq0p0CEvIR2EoiQg9D+ARHyDdo1IEJuQXsFRGI2DwYhZ6B9ASIpmz+DkAvQHgARqv268iNCFV/Xe0Sozusqj4hgE24QqulltSxsUUqDUP3W1RsRqtq6ZiNCtVpXakSoQuv6jEibG3VtaOnH+Vitt/5bK7AcVOLQurAgfdSiJxMVZb6pQsYcEJElECpc/yVYS1LJ0QIYEBH8TZCIwkSdqv8SbIVbiba+kemUj3+qxGpbKNaAWijUIbupqRKobaFAR9RCcYbUQmGOqYXDZWNFQX6gForxms3NVImwvvOpEqBt4WSyWUTxZWxKpkp0toWiu6EWCq5gMzVVQqsnaKpEZls40WyaUWAltVBct9RCYd1RC0V1Ty0U1MdZ9c0Z1tk7g+saizqj2qorKyJUUXU9RYTqqK6iiFD11LUTEaqZumIiQpVS10lEqD7q6ogIVUVdExGhWqgrISJUAXX9Q4Tqnq56iFC107UOEapxusIhQpVN1zVEqJ7paoYIVTFdwxCh2qUrFyJUsXS9QoTqlK5SiFB10rUJEapJuiIhQpVI1yFEqP7o6oMIVR1dcxChWqMrDSJXbAWpLgx4WUh642oj7uMRmz2b+orpVulf31yVw4o7NnmsVXQCqVBfKK9DEPsFoKjGK2oHwisasydGkXpUCmmQDaM0xGD+JFaIGNXHyWwq1FPeY5CPBRhk8fD7wgzuZpiEzSe1qdDfNJq6WcXTT6mrW5PGX6aCqV+uWoz0L9csRhkg1y1GOSA3LEZZIDctRnkgtyxGmSC3LUa5IHcsRtkg31iM8kHuWowyQnYtRjkh9yxGWSH3LUZ5IQ8sRpkhexaj3JCHFqPskEcWo/yQxxajDJEnFqMckacWoyyRZxajPJHnFqNMkRcWo1yRlxajbJFXFjOODIW8Vfj52LCh/ZwbOB83wlUGky7CNQaTNMJ1BpM6wg0Gk0DCTQaTRsItBpNMwm0Gk1LCHQaTWMI3DCa9hLsMJsmEXQaTasI9BpNwwn0Gk3bCAwaTfMIeg0lB4SGDSUThEYNJR+Exg0lK4QmDSU3hKYNJUOEZg0lT4TmDSVbhBYNJWeElg0lc4RWDrePHra2yaqJ+ijJg4hKrhJK2xBqhJC2xTqhW1nNvXX+TMRHg+Z4A6eGlYxh6G4veAAJf4XIcCe82m8RDhLAFntDfe6CXnBSeegMoizGQemsG7nL0lvrLXPvF/CZdkdQptgglcYptQkmbYodQkqZ4QygpU+wSSsIUXUJJl2KPUJKl2CeUVCkOCCVRih6hpElxSChJUhwRSooUx4SSIMUJoaRHcUooyVGcEUpqFOeEkhjFBaGkRXFJKElRXBFaP3JJ0feB/gjhm4ctlQkEcgBd1/wre7hCLZTqKrVQomvUQmmuUws3uw1qoYg2qYXi2aIWimabWiiWHWqhSN5QC8WxSy0URZdaKIY9aqEI9qmFi39ALVz0HrVwsQ+phYt8RC1c3GNq4aKeUAsX85RauIhn1MLFO6cWLtoFtXCxLqmFi3TFrlc5rcplqSUDvmTSOC7cUlT+6pf6MIkNuujdRnKcTaSHdse7xZKWQ+EaIiBH5Lih6vKy1oDu+MAIgrZL0PBLoA0TNBwTaMsEDc8E2jRBwzWBtk3Q8E2gjRM0nBNo6wQN7wTaPEHDPYG2T9DwT6ANFDQcFGgLBQ0PBdpEQcNFgbZR0PBRoI0UNJwUaCsFDS8F2kxBw02BtlPQ8FOgDRU0HBVoSwUNTwXaVEHDVYG2VdDwVaCNFTScFWhrBQ1vBdpcQcNdgbZX0PBXoA0WMIeFnxSw5MhiAt4kHUIR36uXloa+9L0QUiiw2qh2JFDpg4kqPa5sc9V1Ns3fT/tFMtUNXfhUVEjyqIiw5Dnn128gDu51udOvgaiLYH1sxLZviIx9iZ/U3Us4PXu8Z2/WNpgkG0L81I3oDvWdmNaD61Sdek91ymUUD6Hq2deNevT1GbhNyCwY+0K9f+tPZKY/QUHhjLDxGmxu+tRjrE55OIAhOP1Ms6VfgQRuOrafaaIWAv0Mze0c+3nsBzCr36jpVsDMe+5Vx+70uudvzOqKt9EcSFewl3a6TfZoxmt7Y9dMcjbHDTIuZva5m0sUEM7qJ2lNKpB0j6oVjSIomqFFNpKJf0c9LdDsh8Ui0y8xmYdsD6Pk8UTd/Uf1JMBld7sz/gbTbvfBAp75BY1ANZrxJf7xC1z7ImM9jx8swFpWEq0aSqDnWTwq/EQ9kBrfZgUaVOHfC+9Z99uXz9TbO/rd9Ulq3mUVOa6/0G+PPetDHLM+9oHoc28VCyCmfKp+3WO+Q6LeYlMu2ARlvdWLqNkk1DVTm+JIwqIOLzJvmIEKdxtdRzkMI3+p8SJzViSxeng/m3a/XZ61kFkKiuu0cfJWn/eyjcsVk7cwWgvdb/tROpL3zdQxr6jiKvd8lSzHgHut8ENQr6+mWWXoJdwteWvjTKjpyZQBDMbeOn72TeEL4Q2y7Hpp3nmcc5Cr3Tkr/ogaL0I9APzbX1RHT3VU+6TpiEftIbVasZv+/UiPExTUiXrDLwbZ9weYZ3F2OyjAv55//9lCp/n/Jh4enL1c6ny99Prw9cI3q9X/qfh07vdzf5j7cq4z96e5b+a253pzp3PB3M3cX+f+Nvf3zj86/+r8u/Mf0/UHn1Tn/HbO+en893+i+mpP</latexit>

=
0.6
mona et al., 2021; Zhou et al., 2023b) have extended Mixup to

<latexit

sha1_base64="XJJUhO828DzhtMUqHH6k4j8Dqe0=">AAAyOXicjVvLbtzIFdVMXhPlNZMA2WRDRDZmJpAFtW0k2QQYvSWrJbXeD7fHYLNvs2nxJVY1JbnR8xvZJt+RL8kyuyDb/EBuVbF4b1GUJgIssc4pXharzq17mk0P8jgScnn5n598+oMf/ujHP/nsp/M/+/kvfvmrz7/49ZnIJkUAp0EWZ8XFwBcQRymcykjGcJEX4CeDGM4H12uKPy+hEFGWnsj7HN4lfphGoyjwJULv+jF2HfrvO39ZXnr1/vOF5aVl/eM9POhUBwtz1U/v/Red3/aHWTBJIJVB7AvxtrOcy3dTv5BREMNsvj8RkPvBtR/CWzxM/QTEu6ke9cx7jsjQG2UF/kulp1F+xtRPhLhPBtgz8eVYNDkFtnFvJ3L053fTKM0nEtLAXGg0iT2ZeWoKvGFUQCDjezzwgyLCsXrB2C/8QOJEzc/PP1c/3v7Gube3crLtrW9s7uzvnOwc7B97mppvG8ki/lX3IRYHyQxjeHt+ce0JvBDOs/CykRf4uTlWt1zACIoiSkM1qmFURsJ2G0XhpAC8oxRugyxJ/HQ47SMYw0jOptM+JN5XXTz+ejZ70CfAhYDC9lrTrbZ+RRSO62BHqtHWS2a57XOS5W09BpmUWWI7rerWg37Vffu2m/9Yj4HtMXisR2B7BI/1GNoeQ9UDl2Eb7y5Wd+j5HvZXqw4jTJahh3OTuDHwWIGzt513GGUw8hY6KghG2dSLYlYNNQWLXpzdQvEiwNRbmu9jSD2tMFroTM0CftfH1lQHaDsdhxtJP17yNlEMQmLGqLUXasWQNxE3bcTNZkRNy9vMXnPhZXVV4dlOHt5R1Xhpz7iZ+EM6ZeHVwusHpy3W59ijVzzUa307x0bVT04HKt8MvkoBZz5aAtgJMWcf27OPW84+smfpjL7N6ixbqifGXF3omalz8JGpaQYcFwDNkCzewquHEWnWWOxXD2P7qQe4COrklimDG3PPGzdL331lQ3/9dJAo9XIfKTkGEQkWJ8dAX6lI/2egSZ5D4anRmCAb9WBMD2cFVrzCv6XVawR78eKFX2bR0JsItcFFIy/PhIiwJJnQeexjAlbxH11YX23KOeZjy0wpxpxe9XlcH4/cZBVorQ609r2B8J7TEPRObvpW063hekQoOEvbUC9ePCo2HJ0fhxkWoXHScp/ImdHVnZ68URbqwZ2u2FArLaFs2tjr4U3UsZ7eUk6ck1a+96QHk4rqldWdM/Up1AxXHT21KOb8pnp79fk993x7p/UFcNTq+NEBV4KDKFZijdUBlgXsoI6qeKM4ywpN6yPD68OqA1KDZNpp1ixZYCLMpn3lHwI/nq43O5R+HA15h/fmuEimhpo9CAlCtp+gmVl9R5ALVSlzEcVZWlW5IwyRJV7pF5GP2Wr1DdKfqsh3Ms2KBKM+6yP0bGans2jQPjEDlxkQE7hMQMzQZYbEgMsAMSOXGRETukxIzNhlxsRELhMR88FlPhBz7TLXxMQuE8+0jIvEiwRmLJr14b3a7MwKLnofJkJ6wyz9UnrKL6Mc79XO4yyMl1SxUzd2SlfNXCYjJneZnJgbl7khpnCZghjhMoIY6TKSmInLTIgpXaYk5tZlbom5c5k7Yu5d5p6Yjy7zcWbMok0ArO9Zvb2XVZJMTSoNRixt6nFj/dVZYnvoNuMZx+EBwSw3yoBglhjlkGCWFSUQzFKiHBHM8qEMCWbJUI4JZplQTghmaVB+IJjlQHlNMEuAMiY4ZnBCcMJgNtF8hjOCmZjLnGCm5PKGYCbjsiCYabgUBAu+qATL9jnh0i0JZrotbwlmoi3vCGaKLe8JZnItPxJstboRg/rcrT8zFi26BSO61n0ZjPJad2Yw8mvdm8FosHV3BiPE1v0ZjBpbd2gwkmzdo8HosnWXRu7RfRqMQlt3ajAybd2rwWi1uVtbLnG5hHOP7sRgpNu6F4PRb+tuDEbErfsxGCW37shg5Ny6J4PRdOuuDEbYrfsyGHW37sxgJN66N4PReevuDEbsrfszGMU/vkNjLhRRUDuUZIXyY4XSJlkleJXBawSvMXid4HUGbxC8weBNgjcZvEXwFoO3Cd5m8A7BOwx+Q/AbBu8SvMvgLsFdBu8RvMfgfYL3GXxA8AGDewT3GHxI8CGDjwg+YvAxwccMPiH4hMGnBJ8y+IzgMwafE3zO4AuCLxh8SfAlg68Ivnp8e3VFB0Z1TKMrTL9aeoxb5dyay61xbt3l1jm34XIbnNt0uU3ObbncFue2XW6bczsut8O5Ny73hnO7LrfLua7LdTm353J7nNt3uX3OHbjcAed6Ltfj3KHLHXLuyOWOOHfscsecO3G5E86dutwp585c7oxz5y53zrkLl7vg3KXLXXLuyuWs7M+4hSg/gv4cgZ9dl+tzyyyFqf08a7FkYqB+QkWj9sQKd/1wWcEMGRiEfIh2IYiQ+9DeAxHyHGU1EnIa2mcgQv5CuwtEyFVoT4EIeQntJBAhB6H9AyLkG7RrQITcgvYKiMRsHgxCzkD7AkRSNn8GIRegPQAiVPt15UeEKr6u94hQnddVHhHBJtwgVNPLalnYopQGofqtqzciVLV1zUaEarWu1IhQhdb1GZE2N+ra0NKP87Fab/23VmA5qMShdWFB+qhFTyYqynxThYw5ICJLIFS4/kuwlqSSowUwICL4myARhYk6Vf8l2Aq3Em19I9MpH/9UidW2UKwBtVCoQ3ZTUyVQ20KBjqiF4gyphcIcUwuHy8aKgvxALRTjNZubqRJhfedTJUDbwslks4jiy9iUTJXobAtFd0MtFFzBZmqqhFZP0FSJzLZwotk0o8BKaqG4bqmFwrqjForqnlooqI+z6pszrLN3Btc1FnVGtVVXVkSooup6igjVUV1FEaHqqWsnIlQzdcVEhCqlrpOIUH3U1RERqoq6JiJCtVBXQkSoAur6hwjVPV31EKFqp2sdIlTjdIVDhCqbrmuIUD3T1QwRqmK6hiFCtUtXLkSoYul6hQjVKV2lEKHqpGsTIlSTdEVChCqRrkOIUP3R1QcRqjq65iBCtUZXGkSu2ApSXRjwspD0xtVG3McjNns29RXTrdK/vrkqhxV3bPJYq+gEUqG+UF6HIPYLQFGNV9QOhFc0Zk+MIvWoFNIgG0ZpiMH8SawQMaqPk9lUqKe8xyAfCzDI4uH3hRnczTAJm09qU6G/aTR1s4qnn1JXtyaNv0wFU79ctRjpX65ZjDJArluMckBuWIyyQG5ajPJAblmMMkFuW4xyQe5YjLJBvrEY5YPctRhlhOxajHJC7lmMskLuW4zyQh5YjDJD9ixGuSEPLUbZIY8sRvkhjy1GGSJPLEY5Ik8tRlkizyxGeSLPLUaZIi8sRrkiLy1G2SKvLGYcGQp5q/DzsWFD+zk3cD5uhKsMJl2EawwmaYTrDCZ1hBsMJoGEmwwmjYRbDCaZhNsMJqWEOwwmsYRvGEx6CXcZTJIJuwwm1YR7DCbhhPsMJu2EBwwm+YQ9BpOCwkMGk4jCIwaTjsJjBpOUwhMGk5rCUwaToMIzBpOmwnMGk6zCCwaTssJLBpO4wisGW8ePW1tl1UT9FGXAxCVWCSVtiTVCSVpinVCtrOfeuv4mYyLA8z0B0sNLxzD0Nha9AQS+wuU4Et5tNomHCGELPKG/90AvOSk89QZQFmMg9dYM3OXoLfWXufaL+U26IqlTbBFK4hTbhJI2xQ6hJE3xhlBSptgllIQpuoSSLsUeoSRLsU8oqVIcEEqiFD1CSZPikFCSpDgilBQpjgklQYoTQkmP4pRQkqM4I5TUKM4JJTGKC0JJi+KSUJKiuCK0fuSSou8D/RHCNw9bKhMI5AC6rvlX9nCFWijVVWqhRNeohdJcpxZudhvUQhFtUgvFs0UtFM02tVAsO9RCkbyhFopjl1ooii61UAx71EIR7FMLF/+AWrjoPWrhYh9SCxf5iFq4uMfUwkU9oRYu5im1cBHPqIWLd04tXLQLauFiXVILF+mKXa9yWpXLUksGfMmkcVy4paj81S/1YRIbdNG7jeQ4m0gP7Y53iyUth8I1RECOyHFD1eVlrQHd8YERBG2XoOGXQBsmaDgm0JYJGp4JtGmChmsCbZug4ZtAGydoOCfQ1gka3gm0eYKGewJtn6Dhn0AbKGg4KNAWChoeCrSJgoaLAm2joOGjQBspaDgp0FYKGl4KtJmChpsCbaeg4adAGypoOCrQlgoangq0qYKGqwJtq6Dhq0AbK2g4K9DWChreCrS5goa7Am2voOGvQBssYA4LPylgyZHFBLxJOoQivlcvLQ196XshpFBgtVHtSKDSBxNVelzZ5qrrbJq/n/aLZKobuvCpqJDkURFhyXPOr99AHNzrcqdfA1EXwfrYiG3fEBn7Ej+pu5dwevZ4z96sbTBJNoT4qRvRHeo7Ma0H16k69Z7qlMsoHkLVs68b9ejrM3CbkFkw9oV6/9afyEx/goLCGWHjNdjc9KnHWJ3ycABDcPqZZku/AgncdGw/00QtBPoZmts59vPYD2BWv1HTrYCZ99yrjt3pdc/fmNUVb6M5kK5gL+10m+zRjNf2xq6Z5GyOG2RczOxzN5coIJzVT9KaVCDpHlUrGkVQNEOLbCQT/456WqDZD4tFpl9iMg/ZHkbJ44m6+4/qSYDL7nZn/A2m3e6DBTzzCxqBajTjS/zjF7j2RcZ6Hj9YgLWsJFo1lEDPs3hU+Il6IDW+zQo0qMK/F96z7rcvn6m3d/S765PUvMsqclx/od8ee9aHOGZ97APR594qFkBM+VT9usd8h0S9xaZcsAnKeqsXUbNJqGumNsWRhEUdXmTeMAMV7ja6jnIYRv5S40XmrEhi9fB+Nu1+uzxrIbMUFNdp4+StPu9lG5crJm9htBa63/ajdCTvm6ljXlHFVe75KlmOAfda4YegXl9Ns8rQS7hb8tbGmVDTkykDGIy9dfzsm8KXwhtk2fXSvPM45yBXu3NW/AE1XoR6APi3v6iOnuqo9knTEY/aQ2q1Yjf9+5EeJyioE/WGXwyy7w8wz+LsdlCAfz3//vOFTvP/TTw8OHu51Pnj0uvD1wvfrFb/p+Kzud/N/X7uq7nO3J/mvpnbnuvNnc4Fczdzf53729zfO//o/Kvz785/TNdPP6nO+c2c89P57/8ADZ9qTA==</latexit>
the time series domain. Building upon these works, we propose k=2
<latexit sha1_base64="UygptRWTAicL1U11cx2bffRF1jk=">AAAyL3icjVvbbtzIEdVubhvltpsAeckLEdnY3UAWLNtI8hJgdZeskTTSjK6W1+Bwaji0eDO7hyN5MPsHeU2+I18T5CXIa/4i1d1sVjVFaSPAEuuc7mJfTnXVcOhBHkdCPn/+r08+/cEPf/Tjn3z208Wf/fwXv/zV51/8+kxkkyKA0yCLs+Ji4AuIoxROZSRjuMgL8JNBDOeDmw3Fn5dQiChL+/Iuh7eJH6bRKAp8iVDv5i8v3n2+9Hzluf7x7l+sVhdLC9VP990Xq7+9HmbBJIFUBrEvxJvV57l8O/MLGQUxzBevJwJyP7jxQ3iDl6mfgHg702Ode08RGXqjrMB/qfQ0ynvM/ESIu2SALRNfjkWTU2Ab92YiR39+O4vSfCIhDcyNRpPYk5mnJu4NowICGd/hhR8UEY7VC8Z+4QcSl2dxcfGp+vEOt869g7X+rre5tb13uNffOzrseZpabBvJMv5V8xDLg2SOPrwDv7jxBN4IV1d42cgL/NxcqykXMIKiiNJQjWoYlZGwzUZROCkAZ5TCNMiSxE+Hs2sEYxjJ+Wx2DYn3VQevv57P77UJcCOgsK02tNXWrojCce3sRBltrWSW2zb9LG9rMcikzBLbaF1b99pV8/ZtM/+hFgPbYvBQi8C2CB5qMbQthqoFbsMuzi5WM/R8D9urXYcRhsjQw7VJXB94rcD5m9W36GUw8pZWlRP0sq03xewaagqWvTibQvEswIBbWbxGl3pZYbS0OjMb+N01WjPtoK07DjeSfrzibaMYhMSIUXsv1I4hbzxuW4/bTY+altPM3nPpRXVX4dlGHs6oMl7YHh8m/pC6LL1cenWv23Ldx1695K5e6en0jKofXQ5Uvhl8FQLOerQ4sAtievds715L7xPbS0f0NKujbKVeGHN3oVemjsEHlqbpcFwANF0yf0sv73ukVWO+X9737ace4Caozi1LBh/MnLc+rHz3lXX99eNOotTLfaTkGEQkmJ8cHX2lPP2fjiZ5DoWnRmOcbNWDMS2cHVjzCn9Ku9dw9uzZM7/MoqE3EeqAi0ZengkRYSIyrvPYxwCs/D+4sb46lHOMx5aVUozpXrV5WB8PTLJytFE72vheRzjnNAR9kpu21XJruB4RCs7S1tWzZw+KDUfnx2GGSWictMwTOTO6utGjE2Wu7s10zbpaa3Flw8beDydR+3r8SOk7nda+t9O9RUX1ymrmTH0KNcNVV49tiunfVG+37t91+9uZ1jfAUavrBwdcCQ6iWIk1VheYFrCBuqr8jeIsKzStrwyvL6sGSA2S2WozZ8kCA2E+u1b1Q+DHs81mg9KPoyFv8M5cF8nMUPN7LkHI9g6amdczglyoTJmLKM7SKsudoIss8Uq/iHyMVqtvkP5Meb6VaVYk6PXJNUJP5nY5iwbtEzNwmQExgcsExAxdZkgMuAwQM3KZETGhy4TEjF1mTEzkMhEx713mPTE3LnNDTOwy8VzLuEi8SGDEYok+vFOHndnBZe/9REhvmKVfSk/VyyjHO3XyOBvjJZXv1PWd0l0zl8mIyV0mJ+aDy3wgpnCZghjhMoIY6TKSmInLTIgpXaYkZuoyU2JuXeaWmDuXuSPmo8t8nJti0QYA5vesPt7LKkhmJpQGIxY29bgx/+oosS20zXjGcXhAMIuNMiCYBUY5JJhFRQkEs5AoRwSzeChDglkwlGOCWSSUE4JZGJTvCWYxUN4QzAKgjAmOGZwQnDCYLTRf4YxgJuYyJ5gpufxAMJNxWRDMNFwKggXfVIJl+5pw6ZYEM92WU4KZaMtbgpliyzuCmVzLjwRbrW7FoD5368+MRYtuwYiu9VwGo7zWkxmM/FrPZjAabD2dwQix9XwGo8bWExqMJFvPaDC6bD2lkXvwnAaj0NaTGoxMW89qMFptntaWS1wu4dyDJzEY6baexWD023oagxFx63kMRsmtJzIYObeeyWA03XoqgxF267kMRt2tJzMYibeezWB03no6gxF76/kMRvEPn9AYC0UU1BVKskbxsUZhk6wTvM7gDYI3GLxJ8CaDtwjeYvA2wdsM3iF4h8G7BO8yeI/gPQa/Jvg1g/cJ3mdwh+AOgw8IPmDwIcGHDD4i+IjBXYK7DD4m+JjBJwSfMLhHcI/BfYL7DD4l+JTBZwSfMfic4HMGXxB8weBLgi8ZfEXw1cPHqys6MKpjGl1j+tXSY9w65zZcboNzmy63ybktl9vi3LbLbXNux+V2OLfrcruc23O5Pc69drnXnNt3uX3OdVyuw7kDlzvg3KHLHXLuyOWOONd1uS7njl3umHMnLnfCuZ7L9TjXd7k+505d7pRzZy53xrlzlzvn3IXLXXDu0uUuOXflclb2Z7yEKD+C/hyBn12f133LLIWZ/TxrsWRioOuEkkZdEyvcrYfLCmbIwCBUh+gqBBGqPnTtgQjVHGU1Eqo0dJ2BCNUXurpAhKoKXVMgQrWEriQQoQpC1w+IUN2gqwZEqFrQtQIiMVsHg1BloOsCRFK2fgahKkDXAIhQ7teZHxHK+DrfI0J5Xmd5RARbcINQTi+rbWGbUhqE8rfO3ohQ1tY5GxHK1TpTI0IZWudnRNqqUbcMLf04H6v91n9rBZaDShxaFxakj1r0ZKKiYj8ZDFUPc0FElkCocP2XYC1JJUcLoENE8DdBIgoT1VX/JdgKtxJtPZHZjI9/psRqLRRrQBYKdcgmNVMCtRYKdEQWijMkC4U5JguHy8aKgnxPForxhq3NTImwnvlMCdBauJhsFVF8GVuSmRKdtVB0H8hCwRVspWZKaPUCzZTIrIULzZYZBVaSheKakoXCuiULRXVHFgrq47z65gzz7K3BdY5FnVFu1ZkVEcqoOp8iQnlUZ1FEKHvq3IkI5UydMRGhTKnzJCKUH3V2RISyos6JiFAu1JkQEcqAOv8hQnlPZz1EKNvpXIcI5Tid4RChzKbzGiKUz3Q2Q4SymM5hiFDu0pkLEcpYOl8hQnlKZylEKDvp3IQI5SSdkRChTKTzECKUf3T2QYSyjs45iFCu0ZkGkSu2g5QXBjwtJN1xdRBf4xVbPRv6iulU4V9ProphxfVMHGsV9SEV6gvlTQhivwAU1XhNnUB4R1PsiVGkHpVCGmTDKA3RmT+JFSJG9XUynwn1lLcH8iEHgywefp+bwe0cg7D5pDYV+ptGkzcrf/opdTU1aerLVDD1y3WLkf7lhsUoAuSmxSgG5JbFKArktsUoDuSOxSgS5K7FKBbknsUoGuRri1E8yH2LUUTIjsUoJuSBxSgq5KHFKC7kkcUoMmTXYhQb8thiFB3yxGIUH7JnMYoQ2bcYxYg8tRhFiTyzGMWJPLcYRYq8sBjFiry0GEWLvLKYqchQyDuFn48NG9rPuYHzcSNcZzDpItxgMEkj3GQwqSPcYjAJJNxmMGkk3GEwySTcZTApJdxjMIklfM1g0ku4z2CSTNhhMKkmPGAwCSc8ZDBpJzxiMMkn7DKYFBQeM5hEFJ4wmHQU9hhMUgr7DCY1hacMJkGFZwwmTYXnDCZZhRcMJmWFlwwmcYVXDLYVPx5tVakm6qcoAyYusU4oaUtsEErSEpuEamU99Tb1NxkTAZ7vCZAe3jqGobe17A0g8BUux5HwptkkHiKEFnhCf++BteSk8NQbQFmMjtRbM3CbY22pv8y1X8xv0x1JnWKHUBKn2CWUtCn2CCVpiteEkjLFPqEkTNEhlHQpDgglWYpDQkmV4ohQEqXoEkqaFMeEkiTFCaGkSNEjlAQp+oSSHsUpoSRHcUYoqVGcE0piFBeEkhbFJaEkRXFFaP3IJcW6D/RHCN88bKmKQKAKoOMW/6o8XCMLpbpOFkp0gyyU5iZZeNhtkYUi2iYLxbNDFopmlywUyx5ZKJLXZKE49slCUXTIQjEckIUiOCQLN/+ILNz0Llm42cdk4SafkIWb2yMLN7VPFm7mKVm4iWdk4eadk4WbdkEWbtYlWbhJV+x+VaVVVVlqy4BvmTQVFx4pKn71S30YxAZd9qaRHGcT6WG5400xpeVQuAURUEXkVEPV7WWtAd3wXiEIulyCRr0EumCCRsUEumSCRs0EumiCRtUEumyCRt0EunCCRuUEunSCRu0EuniCRvUEunyCRv0EuoCCRgUFuoSCRg0FuoiCRhUFuoyCRh0FupCCRiUFupSCRi0FupiCRjUFupyCRj0FuqCCRkUFuqSCRk0FuqiCRlUFuqyCRl0FurCCRmUFurSCRm0FuriCRnUFuryCRn0FusACVmHhJwVMObKYgDdJh1DEd+qlpaEvfS+EFArMNsqOBCp9MFGpx5VtrprOZ/m72XWRzLShE5/yCkkeFRGmPKd//Qbi4E6nO/0aiLoJ5seGb/uGyNiX+EndvYXTsstbdudtg0myIcSPTUQ3qGdirHv3qRp1H2uUyygeQtXyWhv16OseeEzILBj7Qr1/609kpj9BQeGMsPEabG7a1GOsutwfwBCcdsZsaVcggYeObWdM1EKgn6G5jWM/j/0A5vUbNZ0KmHtPveraXV63/9a8znhbzYF0BHtpp9NkT+Y8tzdOzSRna9wg42Jun7u5RAHhvH6S1qQCSXNUVjSKoGi6FtlIJv4ttbRAsx0mi0y/xGQest33kscTNfuP6kmAy+535vwNpv3OvQ088wsagTKa/iX+8Qvc+yJjLXv3NmAjK4lWhhLoeRaPCj9RD6TG06zAAlX4d8J70vn2xRP19o5+d32SmndZRY77L/TbY0+uIY5ZG/tA9Km3jgkQQz5Vv+4w3iFRb7GpKtg4Za3Vi6jZJNQ5UxfFkYRl7V5k3jAD5W4a3UQ5DCN/pfEic1YksXp4P591vn0+byGzFBS32sbJqe73oo3LFZO3MFoLnW+vo3Qk75qhY15RxV3u+ipYeoBnrfBDUK+vpllV0Eu4XfE2xplQy5OpAjAYe5v42TeFL4U3yLKblUXncc5Rrk7nrPgDarwI9QDw7/WyunqsoTonTUO8anep1YrN9O8HWvRRUH31hl8M8tofYJzF2XRQgH+z+O7zpdXm/5u4f3H2YmX1jyuvjl8tfbNe/Z+KzxZ+t/D7ha8WVhf+tPDNwu5Cd+F0IVgIF/668LeFv6/+Y/Wfq/9e/Y9p+uknVZ/fLDg/q//9H5VKZqc=</latexit>

1
=
0.3
= 0.4
<latexit sha1_base64="+AP2peD/OqB1KTx76Z5Ogw9GpbA=">AAAyOXicjVvLbtzIFdVMXhPlNZMA2WRDRDZmJpAFyTaSbAKM3pLVklrvh9tjsNm32bT4Equaktzo+Y1sk+/Il2SZXZBtfiC3qli8t9iUJgIssc4pXharzq17mk338zgScnn5n598+oMf/ujHP/nsp/M/+/kvfvmrz7/49bnIxkUAZ0EWZ8Vl3xcQRymcyUjGcJkX4Cf9GC76N+uKvyihEFGWnsqHHN4lfphGwyjwJULvejF2HfjvX/5leen1+88XlpeW9Y83e7BSHSzMVT/d91+s/LY3yIJxAqkMYl+ItyvLuXw38QsZBTFM53tjAbkf3PghvMXD1E9AvJvoUU+954gMvGFW4L9UehrlZ0z8RIiHpI89E1+ORJNTYBv3diyHf343idJ8LCENzIWG49iTmaemwBtEBQQyfsADPygiHKsXjPzCDyRO1Pz8/HP14x1sXnj7q6c73sbm1u7B7unu4cGJp6n5tpEs4l91H2Kxn0wxhrfvFzeewAvhPAsvG3qBn5tjdcsFDKEoojRUoxpEZSRst2EUjgvAO0rhLsiSxE8Hkx6CMQzldDLpQeJ91cHjr6fTmT4BLgQUtte6brX1K6JwVAc7Vo22XjLLbZ/TLG/r0c+kzBLbaU23ZvpV9+3bbv5jPfq2R/+xHoHtETzWY2B7DFQPXIYdvLtY3aHne9hfrToMMVkGHs5N4sbAYwVO3668wyj9obewooJglC29KGbVUFOw6MXZHRQvAky9pfkehtTTCsOFlYlZwO962JroAG2n43Aj6cdL3haKQUjMGLX2Qq0Y8ibilo241YyoaXmX2WsuvKyuKjzbycM7qhov7Rm3Y39Apyy8Wng9c9pifY49esVDvda3c2JU/eR0oPLN4KsUcOajJYCdEHP2iT37pOXsY3uWzui7rM6ypXpizNWFnpk6Bx+ZmmbAUQHQDMniLbyajUizxmK/mo3tpx7gIqiTW6YMbs09b94uffeVDf3100Gi1Mt9pOQIRCRYnBwDfaUi/Z+BxnkOhadGY4Js1oMxPZwVWPUK/45WrxHsxYsXfplFA28s1AYXDb08EyLCkmRC57GPCVjFf3RhfbUp55iPLTOlGHN61edxfTxyk1Wg9TrQ+vcGwntOQ9A7uelbTbeG6xGh4CxtQ7148ajYcHR+HGZYhEZJy30iZ0ZXd3ryRlmomTtdtaFWW0LZtLHXw5uoYz29pZw6J61+70kzk4rqldWdM/Up1AxXHT21KOb8pnq79fld93x7p/UFcNTq+NEBV4KDKFZijdUBlgXsoI6qeMM4ywpN6yPD68OqA1L9ZLLSrFmywESYTnrKPwR+PNlodij9OBrwDu/NcZFMDDWdCQlCtp+gmWl9R5ALVSlzEcVZWlW5YwyRJV7pF5GP2Wr1DdKfqMj3Ms2KBKM+6yH0bGqns2jQPjF9l+kTE7hMQMzAZQbEgMsAMUOXGRITukxIzMhlRsRELhMR88FlPhBz4zI3xMQuE0+1jIvEiwRmLJr1wYPa7MwKLnofxkJ6gyz9UnrKL6McH9TO4yyMl1SxUzd2SlfNXCYjJneZnJhbl7klpnCZghjhMoIY6TKSmLHLjIkpXaYk5s5l7oi5d5l7Yh5c5oGYjy7zcWrMok0ArO9Zvb2XVZJMTCr1hyxt6nFj/dVZYnvoNuMZx+E+wSw3yoBglhjlgGCWFSUQzFKiHBLM8qEMCWbJUI4IZplQjglmaVB+IJjlQHlDMEuAMiY4ZnBCcMJgNtF8hjOCmZjLnGCm5PKWYCbjsiCYabgUBAu+qATL9jnh0i0JZrot7whmoi3vCWaKLR8IZnItPxJstboZg/rcrT8zFi26BSO61n0ZjPJad2Yw8mvdm8FosHV3BiPE1v0ZjBpbd2gwkmzdo8HosnWXRu7RfRqMQlt3ajAybd2rwWi1uVtbLnG5hHOP7sRgpNu6F4PRb+tuDEbErfsxGCW37shg5Ny6J4PRdOuuDEbYrfsyGHW37sxgJN66N4PReevuDEbsrfszGMU/vkNjLhRRUDuUZJXyY5XSJlkjeI3B6wSvM3iD4A0GbxK8yeAtgrcYvE3wNoN3CN5h8C7Buwx+Q/AbBu8RvMfgDsEdBu8TvM/gA4IPGHxI8CGDuwR3GXxE8BGDjwk+ZvAJwScMPiX4lMFnBJ8x+JzgcwZfEHzB4EuCLxl8RfAVg68Jvn58e3VFB0Z1TKOrTL9aeoxb49y6y61zbsPlNji36XKbnNtyuS3ObbvcNud2XG6Hc7sut8u5Ny73hnN7LrfHuY7LdTi373L7nDtwuQPOHbrcIee6Ltfl3JHLHXHu2OWOOXficiecO3W5U86dudwZ585d7pxzFy53wblLl7vk3JXLXXHu2uWs7M+5hSg/gv4cgZ9dl+tzyyyFif08a7FkbKBeQkWj9sQKd/1wWcEM6RuEfIh2IYiQ+9DeAxHyHGU1EnIa2mcgQv5CuwtEyFVoT4EIeQntJBAhB6H9AyLkG7RrQITcgvYKiMRsHgxCzkD7AkRSNn8GIRegPQAiVPt15UeEKr6u94hQnddVHhHBJtwgVNPLalnYopQGofqtqzciVLV1zUaEarWu1IhQhdb1GZE2N+ra0NKP85Fab/23VmDZr8ShdWFB+qhFTyYqynxThYw5ICJLIFS4/kuwlqSSowUwICL4myARhYk6Vf8l2Aq3Em19I5MJH/9EidW2UKwBtVCoA3ZTEyVQ20KBDqmF4gyphcIcUQuHy8aKgvxALRTjDZubiRJhfecTJUDbwslks4jiy9iUTJTobAtFd0stFFzBZmqihFZP0ESJzLZwotk0o8BKaqG47qiFwrqnForqgVooqI/T6pszrLP3Btc1FnVGtVVXVkSooup6igjVUV1FEaHqqWsnIlQzdcVEhCqlrpOIUH3U1RERqoq6JiJCtVBXQkSoAur6hwjVPV31EKFqp2sdIlTjdIVDhCqbrmuIUD3T1QwRqmK6hiFCtUtXLkSoYul6hQjVKV2lEKHqpGsTIlSTdEVChCqRrkOIUP3R1QcRqjq65iBCtUZXGkSu2QpSXejzspB0R9VG3MMjNns29RXTqdK/vrkqhxV3YvJYq+gUUqG+UN6AIPYLQFGNVtUOhFc0Zk8MI/WoFNIgG0RpiMH8cawQMayPk+lEqKe8JyAfC9DP4sH3henfTzEJm09qU6G/aTR1s4qnn1JXtyaNv0wFU79csxjpX65bjDJAbliMckBuWoyyQG5ZjPJAbluMMkHuWIxyQe5ajLJBvrEY5YPcsxhlhOxYjHJC7luMskIeWIzyQh5ajDJDdi1GuSGPLEbZIY8tRvkhTyxGGSJPLUY5Is8sRlkizy1GeSIvLEaZIi8tRrkiryxG2SKvLWYcGQp5u/DzkWFD+zk3cD5uhGsMJl2E6wwmaYQbDCZ1hJsMJoGEWwwmjYTbDCaZhDsMJqWEuwwmsYRvGEx6CfcYTJIJOwwm1YT7DCbhhAcMJu2Ehwwm+YRdBpOCwiMGk4jCYwaTjsITBpOUwlMGk5rCMwaToMJzBpOmwgsGk6zCSwaTssIrBpO4wmsGW8ePW1tl1UT9FKXPxCXWCCVtiXVCSVpig1CtrOfehv4mYyzA8z0B0sNLxzDwNhe9PgS+wuUoEt5dNo4HCGELPKG/90AvOS489QZQFmMg9dYM3OfoLfWXufaL+S26IqlTbBNK4hQ7hJI2xS6hJE3xhlBSptgjlIQpOoSSLsU+oSRLcUAoqVIcEkqiFF1CSZPiiFCSpDgmlBQpTgglQYpTQkmP4oxQkqM4J5TUKC4IJTGKS0JJi+KKUJKiuCa0fuSSou8D/RHCNw9bKhMI5AA6rvlX9nCVWijVNWqhRNephdLcoBZudpvUQhFtUQvFs00tFM0OtVAsu9RCkbyhFopjj1ooig61UAz71EIRHFALF/+QWrjoXWrhYh9RCxf5mFq4uCfUwkU9pRYu5hm1cBHPqYWLd0EtXLRLauFiXVELF+maXa9yWpXLUksGfMmkcVy4paj81S/1YRIbdNG7i+QoG0sP7Y53hyUth8I1RECOyHFD1eVlrQHdccYIgrZL0PBLoA0TNBwTaMsEDc8E2jRBwzWBtk3Q8E2gjRM0nBNo6wQN7wTaPEHDPYG2T9DwT6ANFDQcFGgLBQ0PBdpEQcNFgbZR0PBRoI0UNJwUaCsFDS8F2kxBw02BtlPQ8FOgDRU0HBVoSwUNTwXaVEHDVYG2VdDwVaCNFTScFWhrBQ1vBdpcQcNdgbZX0PBXoA0WMIeFnxSw5MhiDN44HUARP6iXlga+9L0QUiiw2qh2JFDp/bEqPa5sc9V1OsnfT3pFMtENXfhUVEjyqIiw5Dnn128g9h90udOvgaiLYH1sxLZviIx8iZ/U3Us4Pbu8Z3faNpgkG0D81I3oDvWdmNbMdapO3ac65TKKB1D17OlGPfr6DNwmZBaMfKHev/XHMtOfoKBwRth4DTY3feoxVqfMDmAATj/TbOlXIIGbju1nmqiFQD9DczvHfh77AUzrN2o6FTD1nnvVsTu97vmb07ribTYH0hHspZ1Okz2e8tre2DWTnM1xg4yLqX3u5hIFhNP6SVqTCiTdo2pFwwiKZmiRDWXi31NPCzT7YbHI9EtM5iHbbJQ8Hqu7/6ieBLjsXmfK32Da68ws4Llf0AhUoxlf4h+/wLUvMtbzZGYB1rOSaNVQAr3I4mHhJ+qB1OguK9CgCv9BeM863758pt7e0e+uj1PzLqvIcf2FfnvsWQ/imPWxD0Sfe2tYADHlU/XrAfMdEvUWm3LBJijrrV5EzcahrpnaFEcSFnV4kXmDDFS4u+gmymEQ+UuNF5mzIonVw/vppPPt8rSFzFJQ3EobJ+/0eS/buFwxeQujtdD5thelQ/nQTB3ziiquctdXyXICuNcKPwT1+mqaVYZewv2Stz7KhJqeTBnAYORt4GffFL4UXj/Lbpbmncc5h7nanbPiD6jxItQDwL+9RXX0VEe1T5qOeNQeUqsVu+nfj/Q4RUGdqjf8YpA9v495Fmd3/QL8m/n3ny+sNP/fxOzB+cullT8uvT56vfDNWvV/Kj6b+93c7+e+mluZ+9PcN3M7c925s7lg7nbur3N/m/v7yj9W/rXy75X/mK6fflKd85s552flv/8DcTVqTg==</latexit>

TSMix, which generalizes the idea of Mixup to more than two 2

datapoints. Concretely, TSMix randomly samples k ∼ U{1, K}


<latexit

sha1_base64="+AP2peD/OqB1KTx76Z5Ogw9GpbA=">AAAyOXicjVvLbtzIFdVMXhPlNZMA2WRDRDZmJpAFyTaSbAKM3pLVklrvh9tjsNm32bT4Equaktzo+Y1sk+/Il2SZXZBtfiC3qli8t9iUJgIssc4pXharzq17mk338zgScnn5n598+oMf/ujHP/nsp/M/+/kvfvmrz7/49bnIxkUAZ0EWZ8Vl3xcQRymcyUjGcJkX4Cf9GC76N+uKvyihEFGWnsqHHN4lfphGwyjwJULvejF2HfjvX/5leen1+88XlpeW9Y83e7BSHSzMVT/d91+s/LY3yIJxAqkMYl+ItyvLuXw38QsZBTFM53tjAbkf3PghvMXD1E9AvJvoUU+954gMvGFW4L9UehrlZ0z8RIiHpI89E1+ORJNTYBv3diyHf343idJ8LCENzIWG49iTmaemwBtEBQQyfsADPygiHKsXjPzCDyRO1Pz8/HP14x1sXnj7q6c73sbm1u7B7unu4cGJp6n5tpEs4l91H2Kxn0wxhrfvFzeewAvhPAsvG3qBn5tjdcsFDKEoojRUoxpEZSRst2EUjgvAO0rhLsiSxE8Hkx6CMQzldDLpQeJ91cHjr6fTmT4BLgQUtte6brX1K6JwVAc7Vo22XjLLbZ/TLG/r0c+kzBLbaU23ZvpV9+3bbv5jPfq2R/+xHoHtETzWY2B7DFQPXIYdvLtY3aHne9hfrToMMVkGHs5N4sbAYwVO3668wyj9obewooJglC29KGbVUFOw6MXZHRQvAky9pfkehtTTCsOFlYlZwO962JroAG2n43Aj6cdL3haKQUjMGLX2Qq0Y8ibilo241YyoaXmX2WsuvKyuKjzbycM7qhov7Rm3Y39Apyy8Wng9c9pifY49esVDvda3c2JU/eR0oPLN4KsUcOajJYCdEHP2iT37pOXsY3uWzui7rM6ypXpizNWFnpk6Bx+ZmmbAUQHQDMniLbyajUizxmK/mo3tpx7gIqiTW6YMbs09b94uffeVDf3100Gi1Mt9pOQIRCRYnBwDfaUi/Z+BxnkOhadGY4Js1oMxPZwVWPUK/45WrxHsxYsXfplFA28s1AYXDb08EyLCkmRC57GPCVjFf3RhfbUp55iPLTOlGHN61edxfTxyk1Wg9TrQ+vcGwntOQ9A7uelbTbeG6xGh4CxtQ7148ajYcHR+HGZYhEZJy30iZ0ZXd3ryRlmomTtdtaFWW0LZtLHXw5uoYz29pZw6J61+70kzk4rqldWdM/Up1AxXHT21KOb8pnq79fld93x7p/UFcNTq+NEBV4KDKFZijdUBlgXsoI6qeMM4ywpN6yPD68OqA1L9ZLLSrFmywESYTnrKPwR+PNlodij9OBrwDu/NcZFMDDWdCQlCtp+gmWl9R5ALVSlzEcVZWlW5YwyRJV7pF5GP2Wr1DdKfqMj3Ms2KBKM+6yH0bGqns2jQPjF9l+kTE7hMQMzAZQbEgMsAMUOXGRITukxIzMhlRsRELhMR88FlPhBz4zI3xMQuE0+1jIvEiwRmLJr1wYPa7MwKLnofxkJ6gyz9UnrKL6McH9TO4yyMl1SxUzd2SlfNXCYjJneZnJhbl7klpnCZghjhMoIY6TKSmLHLjIkpXaYk5s5l7oi5d5l7Yh5c5oGYjy7zcWrMok0ArO9Zvb2XVZJMTCr1hyxt6nFj/dVZYnvoNuMZx+E+wSw3yoBglhjlgGCWFSUQzFKiHBLM8qEMCWbJUI4IZplQjglmaVB+IJjlQHlDMEuAMiY4ZnBCcMJgNtF8hjOCmZjLnGCm5PKWYCbjsiCYabgUBAu+qATL9jnh0i0JZrot7whmoi3vCWaKLR8IZnItPxJstboZg/rcrT8zFi26BSO61n0ZjPJad2Yw8mvdm8FosHV3BiPE1v0ZjBpbd2gwkmzdo8HosnWXRu7RfRqMQlt3ajAybd2rwWi1uVtbLnG5hHOP7sRgpNu6F4PRb+tuDEbErfsxGCW37shg5Ny6J4PRdOuuDEbYrfsyGHW37sxgJN66N4PReevuDEbsrfszGMU/vkNjLhRRUDuUZJXyY5XSJlkjeI3B6wSvM3iD4A0GbxK8yeAtgrcYvE3wNoN3CN5h8C7Buwx+Q/AbBu8RvMfgDsEdBu8TvM/gA4IPGHxI8CGDuwR3GXxE8BGDjwk+ZvAJwScMPiX4lMFnBJ8x+JzgcwZfEHzB4EuCLxl8RfAVg68Jvn58e3VFB0Z1TKOrTL9aeoxb49y6y61zbsPlNji36XKbnNtyuS3ObbvcNud2XG6Hc7sut8u5Ny73hnN7LrfHuY7LdTi373L7nDtwuQPOHbrcIee6Ltfl3JHLHXHu2OWOOXficiecO3W5U86dudwZ585d7pxzFy53wblLl7vk3JXLXXHu2uWs7M+5hSg/gv4cgZ9dl+tzyyyFif08a7FkbKBeQkWj9sQKd/1wWcEM6RuEfIh2IYiQ+9DeAxHyHGU1EnIa2mcgQv5CuwtEyFVoT4EIeQntJBAhB6H9AyLkG7RrQITcgvYKiMRsHgxCzkD7AkRSNn8GIRegPQAiVPt15UeEKr6u94hQnddVHhHBJtwgVNPLalnYopQGofqtqzciVLV1zUaEarWu1IhQhdb1GZE2N+ra0NKP85Fab/23VmDZr8ShdWFB+qhFTyYqynxThYw5ICJLIFS4/kuwlqSSowUwICL4myARhYk6Vf8l2Aq3Em19I5MJH/9EidW2UKwBtVCoA3ZTEyVQ20KBDqmF4gyphcIcUQuHy8aKgvxALRTjDZubiRJhfecTJUDbwslks4jiy9iUTJTobAtFd0stFFzBZmqihFZP0ESJzLZwotk0o8BKaqG47qiFwrqnForqgVooqI/T6pszrLP3Btc1FnVGtVVXVkSooup6igjVUV1FEaHqqWsnIlQzdcVEhCqlrpOIUH3U1RERqoq6JiJCtVBXQkSoAur6hwjVPV31EKFqp2sdIlTjdIVDhCqbrmuIUD3T1QwRqmK6hiFCtUtXLkSoYul6hQjVKV2lEKHqpGsTIlSTdEVChCqRrkOIUP3R1QcRqjq65iBCtUZXGkSu2QpSXejzspB0R9VG3MMjNns29RXTqdK/vrkqhxV3YvJYq+gUUqG+UN6AIPYLQFGNVtUOhFc0Zk8MI/WoFNIgG0RpiMH8cawQMayPk+lEqKe8JyAfC9DP4sH3henfTzEJm09qU6G/aTR1s4qnn1JXtyaNv0wFU79csxjpX65bjDJAbliMckBuWoyyQG5ZjPJAbluMMkHuWIxyQe5ajLJBvrEY5YPcsxhlhOxYjHJC7luMskIeWIzyQh5ajDJDdi1GuSGPLEbZIY8tRvkhTyxGGSJPLUY5Is8sRlkizy1GeSIvLEaZIi8tRrkiryxG2SKvLWYcGQp5u/DzkWFD+zk3cD5uhGsMJl2E6wwmaYQbDCZ1hJsMJoGEWwwmjYTbDCaZhDsMJqWEuwwmsYRvGEx6CfcYTJIJOwwm1YT7DCbhhAcMJu2Ehwwm+YRdBpOCwiMGk4jCYwaTjsITBpOUwlMGk5rCMwaToMJzBpOmwgsGk6zCSwaTssIrBpO4wmsGW8ePW1tl1UT9FKXPxCXWCCVtiXVCSVpig1CtrOfehv4mYyzA8z0B0sNLxzDwNhe9PgS+wuUoEt5dNo4HCGELPKG/90AvOS489QZQFmMg9dYM3OfoLfWXufaL+S26IqlTbBNK4hQ7hJI2xS6hJE3xhlBSptgjlIQpOoSSLsU+oSRLcUAoqVIcEkqiFF1CSZPiiFCSpDgmlBQpTgglQYpTQkmP4oxQkqM4J5TUKC4IJTGKS0JJi+KKUJKiuCa0fuSSou8D/RHCNw9bKhMI5AA6rvlX9nCVWijVNWqhRNephdLcoBZudpvUQhFtUQvFs00tFM0OtVAsu9RCkbyhFopjj1ooig61UAz71EIRHFALF/+QWrjoXWrhYh9RCxf5mFq4uCfUwkU9pRYu5hm1cBHPqYWLd0EtXLRLauFiXVELF+maXa9yWpXLUksGfMmkcVy4paj81S/1YRIbdNG7i+QoG0sP7Y53hyUth8I1RECOyHFD1eVlrQHdccYIgrZL0PBLoA0TNBwTaMsEDc8E2jRBwzWBtk3Q8E2gjRM0nBNo6wQN7wTaPEHDPYG2T9DwT6ANFDQcFGgLBQ0PBdpEQcNFgbZR0PBRoI0UNJwUaCsFDS8F2kxBw02BtlPQ8FOgDRU0HBVoSwUNTwXaVEHDVYG2VdDwVaCNFTScFWhrBQ1vBdpcQcNdgbZX0PBXoA0WMIeFnxSw5MhiDN44HUARP6iXlga+9L0QUiiw2qh2JFDp/bEqPa5sc9V1OsnfT3pFMtENXfhUVEjyqIiw5Dnn128g9h90udOvgaiLYH1sxLZviIx8iZ/U3Us4Pbu8Z3faNpgkG0D81I3oDvWdmNbMdapO3ac65TKKB1D17OlGPfr6DNwmZBaMfKHev/XHMtOfoKBwRth4DTY3feoxVqfMDmAATj/TbOlXIIGbju1nmqiFQD9DczvHfh77AUzrN2o6FTD1nnvVsTu97vmb07ribTYH0hHspZ1Okz2e8tre2DWTnM1xg4yLqX3u5hIFhNP6SVqTCiTdo2pFwwiKZmiRDWXi31NPCzT7YbHI9EtM5iHbbJQ8Hqu7/6ieBLjsXmfK32Da68ws4Llf0AhUoxlf4h+/wLUvMtbzZGYB1rOSaNVQAr3I4mHhJ+qB1OguK9CgCv9BeM863758pt7e0e+uj1PzLqvIcf2FfnvsWQ/imPWxD0Sfe2tYADHlU/XrAfMdEvUWm3LBJijrrV5EzcahrpnaFEcSFnV4kXmDDFS4u+gmymEQ+UuNF5mzIonVw/vppPPt8rSFzFJQ3EobJ+/0eS/buFwxeQujtdD5thelQ/nQTB3ziiquctdXyXICuNcKPwT1+mqaVYZewv2Stz7KhJqeTBnAYORt4GffFL4UXj/Lbpbmncc5h7nanbPiD6jxItQDwL+9RXX0VEe1T5qOeNQeUqsVu+nfj/Q4RUGdqjf8YpA9v495Fmd3/QL8m/n3ny+sNP/fxOzB+cullT8uvT56vfDNWvV/Kj6b+93c7+e+mluZ+9PcN3M7c925s7lg7nbur3N/m/v7yj9W/rXy75X/mK6fflKd85s552flv/8DcTVqTg==</latexit>

2 =
0.4
time series of a specific length, l ∼ U{lmin , lmax }, from the
k=3
<latexit sha1_base64="oomw4dJbq3fET7MQJEcRMzy2mjw=">AAAyL3icjVvbbtzIEdVubhvltpsAeckLEdnY3UAWLNtI8hJgdZeskTTSjK6W1+Bwaji0eDO7hyN5MPsHeU2+I18T5CXIa/4i1d1sVjVFaSPAEuuc7mJfTnXVcOhBHkdCPn/+r08+/cEPf/Tjn3z208Wf/fwXv/zV51/8+kxkkyKA0yCLs+Ji4AuIoxROZSRjuMgL8JNBDOeDmw3Fn5dQiChL+/Iuh7eJH6bRKAp8iVDv5i8v332+9Hzluf7x7l+sVhdLC9VP990Xq7+9HmbBJIFUBrEvxJvV57l8O/MLGQUxzBevJwJyP7jxQ3iDl6mfgHg702Ode08RGXqjrMB/qfQ0ynvM/ESIu2SALRNfjkWTU2Ab92YiR39+O4vSfCIhDcyNRpPYk5mnJu4NowICGd/hhR8UEY7VC8Z+4QcSl2dxcfGp+vEOt869g7X+rre5tb13uNffOzrseZpabBvJMv5V8xDLg2SOPrwDv7jxBN4IV1d42cgL/NxcqykXMIKiiNJQjWoYlZGwzUZROCkAZ5TCNMiSxE+Hs2sEYxjJ+Wx2DYn3VQevv57P77UJcCOgsK02tNXWrojCce3sRBltrWSW2zb9LG9rMcikzBLbaF1b99pV8/ZtM/+hFgPbYvBQi8C2CB5qMbQthqoFbsMuzi5WM/R8D9urXYcRhsjQw7VJXB94rcD5m9W36GUw8pZWlRP0sq03xewaagqWvTibQvEswIBbWbxGl3pZYbS0OjMb+N01WjPtoK07DjeSfrzibaMYhMSIUXsv1I4hbzxuW4/bTY+altPM3nPpRXVX4dlGHs6oMl7YHh8m/pC6LL1cenWv23Ldx1695K5e6en0jKofXQ5Uvhl8FQLOerQ4sAtievds715L7xPbS0f0NKujbKVeGHN3oVemjsEHlqbpcFwANF0yf0sv73ukVWO+X9737ace4Caozi1LBh/MnLc+rHz3lXX99eNOotTLfaTkGEQkmJ8cHX2lPP2fjiZ5DoWnRmOcbNWDMS2cHVjzCn9Ku9dw9uzZM7/MoqE3EeqAi0ZengkRYSIyrvPYxwCs/D+4sb46lHOMx5aVUozpXrV5WB8PTLJytFE72vheRzjnNAR9kpu21XJruB4RCs7S1tWzZw+KDUfnx2GGSWictMwTOTO6utGjE2Wu7s10zbpaa3Flw8beDydR+3r8SOk7nda+t9O9RUX1ymrmTH0KNcNVV49tiunfVG+37t91+9uZ1jfAUavrBwdcCQ6iWIk1VheYFrCBuqr8jeIsKzStrwyvL6sGSA2S2WozZ8kCA2E+u1b1Q+DHs81mg9KPoyFv8M5cF8nMUPN7LkHI9g6amdczglyoTJmLKM7SKsudoIss8Uq/iHyMVqtvkP5Meb6VaVYk6PXJNUJP5nY5iwbtEzNwmQExgcsExAxdZkgMuAwQM3KZETGhy4TEjF1mTEzkMhEx713mPTE3LnNDTOwy8VzLuEi8SGDEYok+vFOHndnBZe/9REhvmKVfSk/VyyjHO3XyOBvjJZXv1PWd0l0zl8mIyV0mJ+aDy3wgpnCZghjhMoIY6TKSmInLTIgpXaYkZuoyU2JuXeaWmDuXuSPmo8t8nJti0QYA5vesPt7LKkhmJpQGIxY29bgx/+oosS20zXjGcXhAMIuNMiCYBUY5JJhFRQkEs5AoRwSzeChDglkwlGOCWSSUE4JZGJTvCWYxUN4QzAKgjAmOGZwQnDCYLTRf4YxgJuYyJ5gpufxAMJNxWRDMNFwKggXfVIJl+5pw6ZYEM92WU4KZaMtbgpliyzuCmVzLjwRbrW7FoD5368+MRYtuwYiu9VwGo7zWkxmM/FrPZjAabD2dwQix9XwGo8bWExqMJFvPaDC6bD2lkXvwnAaj0NaTGoxMW89qMFptntaWS1wu4dyDJzEY6baexWD023oagxFx63kMRsmtJzIYObeeyWA03XoqgxF267kMRt2tJzMYibeezWB03no6gxF76/kMRvEPn9AYC0UU1BVKskbxsUZhk6wTvM7gDYI3GLxJ8CaDtwjeYvA2wdsM3iF4h8G7BO8yeI/gPQa/Jvg1g/cJ3mdwh+AOgw8IPmDwIcGHDD4i+IjBXYK7DD4m+JjBJwSfMLhHcI/BfYL7DD4l+JTBZwSfMfic4HMGXxB8weBLgi8ZfEXw1cPHqys6MKpjGl1j+tXSY9w65zZcboNzmy63ybktl9vi3LbLbXNux+V2OLfrcruc23O5Pc69drnXnNt3uX3OdVyuw7kDlzvg3KHLHXLuyOWOONd1uS7njl3umHMnLnfCuZ7L9TjXd7k+505d7pRzZy53xrlzlzvn3IXLXXDu0uUuOXflclb2Z7yEKD+C/hyBn12f133LLIWZ/TxrsWRioOuEkkZdEyvcrYfLCmbIwCBUh+gqBBGqPnTtgQjVHGU1Eqo0dJ2BCNUXurpAhKoKXVMgQrWEriQQoQpC1w+IUN2gqwZEqFrQtQIiMVsHg1BloOsCRFK2fgahKkDXAIhQ7teZHxHK+DrfI0J5Xmd5RARbcINQTi+rbWGbUhqE8rfO3ohQ1tY5GxHK1TpTI0IZWudnRNqqUbcMLf04H6v91n9rBZaDShxaFxakj1r0ZKKiYj8ZDFUPc0FElkCocP2XYC1JJUcLoENE8DdBIgoT1VX/JdgKtxJtPZHZjI9/psRqLRRrQBYKdcgmNVMCtRYKdEQWijMkC4U5JguHy8aKgnxPForxhq3NTImwnvlMCdBauJhsFVF8GVuSmRKdtVB0H8hCwRVspWZKaPUCzZTIrIULzZYZBVaSheKakoXCuiULRXVHFgrq47z65gzz7K3BdY5FnVFu1ZkVEcqoOp8iQnlUZ1FEKHvq3IkI5UydMRGhTKnzJCKUH3V2RISyos6JiFAu1JkQEcqAOv8hQnlPZz1EKNvpXIcI5Tid4RChzKbzGiKUz3Q2Q4SymM5hiFDu0pkLEcpYOl8hQnlKZylEKDvp3IQI5SSdkRChTKTzECKUf3T2QYSyjs45iFCu0ZkGkSu2g5QXBjwtJN1xdRBf4xVbPRv6iulU4V9ProphxfVMHGsV9SEV6gvlTQhivwAU1XhNnUB4R1PsiVGkHpVCGmTDKA3RmT+JFSJG9XUynwn1lLcH8iEHgywefp+bwe0cg7D5pDYV+ptGkzcrf/opdTU1aerLVDD1y3WLkf7lhsUoAuSmxSgG5JbFKArktsUoDuSOxSgS5K7FKBbknsUoGuRri1E8yH2LUUTIjsUoJuSBxSgq5KHFKC7kkcUoMmTXYhQb8thiFB3yxGIUH7JnMYoQ2bcYxYg8tRhFiTyzGMWJPLcYRYq8sBjFiry0GEWLvLKYqchQyDuFn48NG9rPuYHzcSNcZzDpItxgMEkj3GQwqSPcYjAJJNxmMGkk3GEwySTcZTApJdxjMIklfM1g0ku4z2CSTNhhMKkmPGAwCSc8ZDBpJzxiMMkn7DKYFBQeM5hEFJ4wmHQU9hhMUgr7DCY1hacMJkGFZwwmTYXnDCZZhRcMJmWFlwwmcYVXDLYVPx5tVakm6qcoAyYusU4oaUtsEErSEpuEamU99Tb1NxkTAZ7vCZAe3jqGobe17A0g8BUux5HwptkkHiKEFnhCf++BteSk8NQbQFmMjtRbM3CbY22pv8y1X8xv0x1JnWKHUBKn2CWUtCn2CCVpiteEkjLFPqEkTNEhlHQpDgglWYpDQkmV4ohQEqXoEkqaFMeEkiTFCaGkSNEjlAQp+oSSHsUpoSRHcUYoqVGcE0piFBeEkhbFJaEkRXFFaP3IJcW6D/RHCN88bKmKQKAKoOMW/6o8XCMLpbpOFkp0gyyU5iZZeNhtkYUi2iYLxbNDFopmlywUyx5ZKJLXZKE49slCUXTIQjEckIUiOCQLN/+ILNz0Llm42cdk4SafkIWb2yMLN7VPFm7mKVm4iWdk4eadk4WbdkEWbtYlWbhJV+x+VaVVVVlqy4BvmTQVFx4pKn71S30YxAZd9qaRHGcT6WG5400xpeVQuAURUEXkVEPV7WWtAd3wXiEIulyCRr0EumCCRsUEumSCRs0EumiCRtUEumyCRt0EunCCRuUEunSCRu0EuniCRvUEunyCRv0EuoCCRgUFuoSCRg0FuoiCRhUFuoyCRh0FupCCRiUFupSCRi0FupiCRjUFupyCRj0FuqCCRkUFuqSCRk0FuqiCRlUFuqyCRl0FurCCRmUFurSCRm0FuriCRnUFuryCRn0FusACVmHhJwVMObKYgDdJh1DEd+qlpaEvfS+EFArMNsqOBCp9MFGpx5VtrprOZ/m72XWRzLShE5/yCkkeFRGmPKd//Qbi4E6nO/0aiLoJ5seGb/uGyNiX+EndvYXTsstbdudtg0myIcSPTUQ3qGdirHv3qRp1H2uUyygeQtXyWhv16OseeEzILBj7Qr1/609kpj9BQeGMsPEabG7a1GOsutwfwBCcdsZsaVcggYeObWdM1EKgn6G5jWM/j/0A5vUbNZ0KmHtPveraXV63/9a8znhbzYF0BHtpp9NkT+Y8tzdOzSRna9wg42Jun7u5RAHhvH6S1qQCSXNUVjSKoGi6FtlIJv4ttbRAsx0mi0y/xGQest33kscTNfuP6kmAy+535vwNpv3OvQ088wsagTKa/iX+8Qvc+yJjLXv3NmAjK4lWhhLoeRaPCj9RD6TG06zAAlX4d8J70vn2xRP19o5+d32SmndZRY77L/TbY0+uIY5ZG/tA9Km3jgkQQz5Vv+4w3iFRb7GpKtg4Za3Vi6jZJNQ5UxfFkYRl7V5k3jAD5W4a3UQ5DCN/pfEic1YksXp4P591vn0+byGzFBS32sbJqe73oo3LFZO3MFoLnW+vo3Qk75qhY15RxV3u+ipYeoBnrfBDUK+vpllV0Eu4XfE2xplQy5OpAjAYe5v42TeFL4U3yLKblUXncc5Rrk7nrPgDarwI9QDw7/WyunqsoTonTUO8anep1YrN9O8HWvRRUH31hl8M8tofYJzF2XRQgH+z+O7zpdXm/5u4f3H2YmX1jyuvjl8tfbNe/Z+KzxZ+t/D7ha8WVhf+tPDNwu5Cd+F0IVgIF/668LeFv6/+Y/Wfq/9e/Y9p+uknVZ/fLDg/q//9H8cTZqg=</latexit>

training datasets, scales them, and takes their convex combi- <latexit sha1_base64="8+XFuh9fvoeSyvfde59TOZyBrH8=">AAAyOXicjVvLbtzIFdVMXhPlNZMA2WRDRDZmJpAFyTaSbAKM3pLVklrvh9tjsNm32bT4Equaktzo+Y1sk+/Il2SZXZBtfiC3qli8t9iUJgIssc4pXharzq17mk338zgScnn5n598+oMf/ujHP/nsp/M/+/kvfvmrz7/49bnIxkUAZ0EWZ8Vl3xcQRymcyUjGcJkX4Cf9GC76N+uKvyihEFGWnsqHHN4lfphGwyjwJULvejF2HfjvX/1leenV+88XlpeW9Y83e7BSHSzMVT/d91+s/LY3yIJxAqkMYl+ItyvLuXw38QsZBTFM53tjAbkf3PghvMXD1E9AvJvoUU+954gMvGFW4L9UehrlZ0z8RIiHpI89E1+ORJNTYBv3diyHf343idJ8LCENzIWG49iTmaemwBtEBQQyfsADPygiHKsXjPzCDyRO1Pz8/HP14x1sXnj7q6c73sbm1u7B7unu4cGJp6n5tpEs4l91H2Kxn0wxhrfvFzeewAvhPAsvG3qBn5tjdcsFDKEoojRUoxpEZSRst2EUjgvAO0rhLsiSxE8Hkx6CMQzldDLpQeJ91cHjr6fTmT4BLgQUtte6brX1K6JwVAc7Vo22XjLLbZ/TLG/r0c+kzBLbaU23ZvpV9+3bbv5jPfq2R/+xHoHtETzWY2B7DFQPXIYdvLtY3aHne9hfrToMMVkGHs5N4sbAYwVO3668wyj9obewooJglC29KGbVUFOw6MXZHRQvAky9pfkehtTTCsOFlYlZwO962JroAG2n43Aj6cdL3haKQUjMGLX2Qq0Y8ibilo241YyoaXmX2WsuvKyuKjzbycM7qhov7Rm3Y39Apyy8Wng9c9pifY49esVDvda3c2JU/eR0oPLN4KsUcOajJYCdEHP2iT37pOXsY3uWzui7rM6ypXpizNWFnpk6Bx+ZmmbAUQHQDMniLbyajUizxmK/mo3tpx7gIqiTW6YMbs09b94uffeVDf3100Gi1Mt9pOQIRCRYnBwDfaUi/Z+BxnkOhadGY4Js1oMxPZwVWPUK/45WrxHsxYsXfplFA28s1AYXDb08EyLCkmRC57GPCVjFf3RhfbUp55iPLTOlGHN61edxfTxyk1Wg9TrQ+vcGwntOQ9A7uelbTbeG6xGh4CxtQ7148ajYcHR+HGZYhEZJy30iZ0ZXd3ryRlmomTtdtaFWW0LZtLHXw5uoYz29pZw6J61+70kzk4rqldWdM/Up1AxXHT21KOb8pnq79fld93x7p/UFcNTq+NEBV4KDKFZijdUBlgXsoI6qeMM4ywpN6yPD68OqA1L9ZLLSrFmywESYTnrKPwR+PNlodij9OBrwDu/NcZFMDDWdCQlCtp+gmWl9R5ALVSlzEcVZWlW5YwyRJV7pF5GP2Wr1DdKfqMj3Ms2KBKM+6yH0bGqns2jQPjF9l+kTE7hMQMzAZQbEgMsAMUOXGRITukxIzMhlRsRELhMR88FlPhBz4zI3xMQuE0+1jIvEiwRmLJr1wYPa7MwKLnofxkJ6gyz9UnrKL6McH9TO4yyMl1SxUzd2SlfNXCYjJneZnJhbl7klpnCZghjhMoIY6TKSmLHLjIkpXaYk5s5l7oi5d5l7Yh5c5oGYjy7zcWrMok0ArO9Zvb2XVZJMTCr1hyxt6nFj/dVZYnvoNuMZx+E+wSw3yoBglhjlgGCWFSUQzFKiHBLM8qEMCWbJUI4IZplQjglmaVB+IJjlQHlDMEuAMiY4ZnBCcMJgNtF8hjOCmZjLnGCm5PKWYCbjsiCYabgUBAu+qATL9jnh0i0JZrot7whmoi3vCWaKLR8IZnItPxJstboZg/rcrT8zFi26BSO61n0ZjPJad2Yw8mvdm8FosHV3BiPE1v0ZjBpbd2gwkmzdo8HosnWXRu7RfRqMQlt3ajAybd2rwWi1uVtbLnG5hHOP7sRgpNu6F4PRb+tuDEbErfsxGCW37shg5Ny6J4PRdOuuDEbYrfsyGHW37sxgJN66N4PReevuDEbsrfszGMU/vkNjLhRRUDuUZJXyY5XSJlkjeI3B6wSvM3iD4A0GbxK8yeAtgrcYvE3wNoN3CN5h8C7Buwx+Q/AbBu8RvMfgDsEdBu8TvM/gA4IPGHxI8CGDuwR3GXxE8BGDjwk+ZvAJwScMPiX4lMFnBJ8x+JzgcwZfEHzB4EuCLxl8RfAVg68Jvn58e3VFB0Z1TKOrTL9aeoxb49y6y61zbsPlNji36XKbnNtyuS3ObbvcNud2XG6Hc7sut8u5Ny73hnN7LrfHuY7LdTi373L7nDtwuQPOHbrcIee6Ltfl3JHLHXHu2OWOOXficiecO3W5U86dudwZ585d7pxzFy53wblLl7vk3JXLXXHu2uWs7M+5hSg/gv4cgZ9dl+tzyyyFif08a7FkbKBeQkWj9sQKd/1wWcEM6RuEfIh2IYiQ+9DeAxHyHGU1EnIa2mcgQv5CuwtEyFVoT4EIeQntJBAhB6H9AyLkG7RrQITcgvYKiMRsHgxCzkD7AkRSNn8GIRegPQAiVPt15UeEKr6u94hQnddVHhHBJtwgVNPLalnYopQGofqtqzciVLV1zUaEarWu1IhQhdb1GZE2N+ra0NKP85Fab/23VmDZr8ShdWFB+qhFTyYqynxThYw5ICJLIFS4/kuwlqSSowUwICL4myARhYk6Vf8l2Aq3Em19I5MJH/9EidW2UKwBtVCoA3ZTEyVQ20KBDqmF4gyphcIcUQuHy8aKgvxALRTjDZubiRJhfecTJUDbwslks4jiy9iUTJTobAtFd0stFFzBZmqihFZP0ESJzLZwotk0o8BKaqG47qiFwrqnForqgVooqI/T6pszrLP3Btc1FnVGtVVXVkSooup6igjVUV1FEaHqqWsnIlQzdcVEhCqlrpOIUH3U1RERqoq6JiJCtVBXQkSoAur6hwjVPV31EKFqp2sdIlTjdIVDhCqbrmuIUD3T1QwRqmK6hiFCtUtXLkSoYul6hQjVKV2lEKHqpGsTIlSTdEVChCqRrkOIUP3R1QcRqjq65iBCtUZXGkSu2QpSXejzspB0R9VG3MMjNns29RXTqdK/vrkqhxV3YvJYq+gUUqG+UN6AIPYLQFGNVtUOhFc0Zk8MI/WoFNIgG0RpiMH8cawQMayPk+lEqKe8JyAfC9DP4sH3henfTzEJm09qU6G/aTR1s4qnn1JXtyaNv0wFU79csxjpX65bjDJAbliMckBuWoyyQG5ZjPJAbluMMkHuWIxyQe5ajLJBvrEY5YPcsxhlhOxYjHJC7luMskIeWIzyQh5ajDJDdi1GuSGPLEbZIY8tRvkhTyxGGSJPLUY5Is8sRlkizy1GeSIvLEaZIi8tRrkiryxG2SKvLWYcGQp5u/DzkWFD+zk3cD5uhGsMJl2E6wwmaYQbDCZ1hJsMJoGEWwwmjYTbDCaZhDsMJqWEuwwmsYRvGEx6CfcYTJIJOwwm1YT7DCbhhAcMJu2Ehwwm+YRdBpOCwiMGk4jCYwaTjsITBpOUwlMGk5rCMwaToMJzBpOmwgsGk6zCSwaTssIrBpO4wmsGW8ePW1tl1UT9FKXPxCXWCCVtiXVCSVpig1CtrOfehv4mYyzA8z0B0sNLxzDwNhe9PgS+wuUoEt5dNo4HCGELPKG/90AvOS489QZQFmMg9dYM3OfoLfWXufaL+S26IqlTbBNK4hQ7hJI2xS6hJE3xhlBSptgjlIQpOoSSLsU+oSRLcUAoqVIcEkqiFF1CSZPiiFCSpDgmlBQpTgglQYpTQkmP4oxQkqM4J5TUKC4IJTGKS0JJi+KKUJKiuCa0fuSSou8D/RHCNw9bKhMI5AA6rvlX9nCVWijVNWqhRNephdLcoBZudpvUQhFtUQvFs00tFM0OtVAsu9RCkbyhFopjj1ooig61UAz71EIRHFALF/+QWrjoXWrhYh9RCxf5mFq4uCfUwkU9pRYu5hm1cBHPqYWLd0EtXLRLauFiXVELF+maXa9yWpXLUksGfMmkcVy4paj81S/1YRIbdNG7i+QoG0sP7Y53hyUth8I1RECOyHFD1eVlrQHdccYIgrZL0PBLoA0TNBwTaMsEDc8E2jRBwzWBtk3Q8E2gjRM0nBNo6wQN7wTaPEHDPYG2T9DwT6ANFDQcFGgLBQ0PBdpEQcNFgbZR0PBRoI0UNJwUaCsFDS8F2kxBw02BtlPQ8FOgDRU0HBVoSwUNTwXaVEHDVYG2VdDwVaCNFTScFWhrBQ1vBdpcQcNdgbZX0PBXoA0WMIeFnxSw5MhiDN44HUARP6iXlga+9L0QUiiw2qh2JFDp/bEqPa5sc9V1OsnfT3pFMtENXfhUVEjyqIiw5Dnn128g9h90udOvgaiLYH1sxLZviIx8iZ/U3Us4Pbu8Z3faNpgkG0D81I3oDvWdmNbMdapO3ac65TKKB1D17OlGPfr6DNwmZBaMfKHev/XHMtOfoKBwRth4DTY3feoxVqfMDmAATj/TbOlXIIGbju1nmqiFQD9DczvHfh77AUzrN2o6FTD1nnvVsTu97vmb07ribTYH0hHspZ1Okz2e8tre2DWTnM1xg4yLqX3u5hIFhNP6SVqTCiTdo2pFwwiKZmiRDWXi31NPCzT7YbHI9EtM5iHbbJQ8Hqu7/6ieBLjsXmfK32Da68ws4Llf0AhUoxlf4h+/wLUvMtbzZGYB1rOSaNVQAr3I4mHhJ+qB1OguK9CgCv9BeM863758pt7e0e+uj1PzLqvIcf2FfnvsWQ/imPWxD0Sfe2tYADHlU/XrAfMdEvUWm3LBJijrrV5EzcahrpnaFEcSFnV4kXmDDFS4u+gmymEQ+UuNF5mzIonVw/vppPPt8rSFzFJQ3EobJ+/0eS/buFwxeQujtdD5thelQ/nQTB3ziiquctdXyXICuNcKPwT1+mqaVYZewv2Stz7KhJqeTBnAYORt4GffFL4UXj/Lbpbmncc5h7nanbPiD6jxItQDwL+9RXX0VEe1T5qOeNQeUqsVu+nfj/Q4RUGdqjf8YpA9v495Fmd3/QL8m/n3ny+sNP/fxOzB+cullT8uvT56vfDNWvV/Kj6b+93c7+e+mluZ+9PcN3M7c925s7lg7nbur3N/m/v7yj9W/rXy75X/mK6fflKd85s552flv/8DcTlqTg==</latexit>

= 0.3
nation,
3

k
(i)
X
x̃TSMix
1:l = λi x̃1:l , (3)
i=1 Figure 2: An illustration of TSMix augmenta-
tion for k = {1, 2, 3}. TSMix improves pat-
(i)
where x̃1:l denotes the i-th scaled time series. The combination tern diversity by taking weighted combinations
weights, [λ1 , . . . , λk ], are sampled from a symmetric Dirichlet of randomly-sampled time series from different
distribution, Dir(α). The complete pseudocode of TSMix can datasets.
be found in Algorithm 1 in Appendix A. Intuitively, TSMix enhances the diversity of data by combining pat-
terns from different time series. Figure 2 shows example augmentations generated by TSMix and illustrates
how different patterns are mixed.

4.2 KernelSynth: Synthetic Data Generation using Gaussian Processes

While TSMix improves pattern diversity, it may still prove insufficient for training a generalist time series
model, especially when real data is limited. To further supplement the training dataset, we propose Kernel-
Synth, a method to generate synthetic time series using Gaussian processes (GPs). KernelSynth is inspired
by the Automatic Statistician (Duvenaud et al., 2013), where a compositional search over a space of GP
kernels is performed to explain the structure of a time series. We use the inverse of this process — randomly
compose GP kernels to generate new time series.
GPs are distributions over functions defined by the mean function, m(t), and the positive definite kernel,
κ(t, t′ ), where t ∈ R is the domain. The kernel specifies a covariance function which defines the joint
variability of the function values at an arbitrary pair of points, (t, t′ ), in the input domain. Diverse patterns
can be generated by appropriately selecting the kernel. We constructed a kernel bank, K, of basis kernels
defining fundamental time series patterns. These include linear kernels for trend, RBF kernels for smooth
local variation, and periodic kernels for seasonalities found in typical time series frequencies. The final kernel,
κ̃(t, t′ ), is constructed by sampling j ∼ U{1, J} kernels from K with replacement and combining these kernels
via random binary operations, + or ×. A synthetic time series is generated by drawing a sample of length
lsyn from the GP prior, GP(m(t) = 0, κ̃(t, t′ )); see Algorithm 2 in Appendix A for details. Figure 3 depicts
this generative process used in KernelSynth, illustrating how time series with intricate patterns can arise
from the composition of simple basis kernels.

5 Experiments

In this section, we present empirical results on commonly used benchmark datasets. First, we give an
overview of the datasets, training strategy, baselines, and evaluation metrics (Section 5.1-5.4). Table 1
provides a high-level summary of the datasets and baselines used in our experiments. We then (a) evaluate
the performance of Chronos models in the in-domain and zero-shot settings against local models and
task-specific deep learning models (Section 5.5); (b) analyze the effect of various design choices such as
model size, initialization, synthetic data proportion, context length, and vocabulary size on the performance
of Chronos models (Section 5.6); and (c) analyze the qualitative performance of Chronos models and

7
Kernel Bank


Linear


Linear

∼ sample
kernels
RBF
Linear Linear × Linear (Linear × Linear)


+
Periodic

Periodic
Periodic

(a) KernelSynth (b) Synthetic samples from KernelSynth

Figure 3: (a) An illustration of KernelSynth, a Gaussian process (GP)-based synthetic time series generation method.
Kernels are sampled from a kernel bank and then randomly combined using a binary operator (× or +). The resultant
kernel is used in a GP prior to generate synthetic time series. Random samples from kernels at each step are shown
in red and blue colors. (b) Example synthetic time series generated by KernelSynth.

highlight their limitations (Section 5.7). We discuss our key findings in this section and relegate specific
experiment details to the appendices.

Table 1: A high-level summary of the datasets and baselines used in our experiments.

Data Subset # Datasets # Series Usage Baselines


Pretraining-only 13 795,936 pretraining –
Benchmark I 15 97,272 pretraining and in- Naive, SeasonalNaive, AutoETS, Au-
domain evaluation toARIMA, AutoTheta, DeepAR, TFT,
PatchTST, DLinear, WaveNet, N-BEATS,
N-HiTS, GPT4TS
Benchmark II 27 103,047 zero-shot evaluation All the above and ForecastPFN

5.1 Datasets

To train and evaluate Chronos models, we collected a wide variety of publicly available datasets spanning
various application domains including energy, transport, healthcare, retail, web, weather, finance, and with
sampling frequencies ranging from 5 minutes up to yearly. The complete list of datasets, together with their
respective sources and additional details, is given in Appendix B. In total, our dataset collection comprises
55 datasets from multiple sources, including the Monash Time Series Forecasting Repository (Godahewa
et al., 2021), the M-competitions (Makridakis et al., 1979; Makridakis & Hibon, 2000; Makridakis et al.,
2020; 2022), and public domain datasets from Kaggle.
We categorize this collection into three subsets, based on how we use them for training and evaluating
Chronos models: (a) datasets exclusively used for training (13 datasets); (b) Benchmark I datasets, em-
ployed for both training and evaluation, representing an in-domain evaluation (15 datasets); and (c) Bench-
mark II datasets, used solely for evaluation, constituting a zero-shot evaluation (27 datasets). In categorizing
the datasets in this way, we tried to find a good balance between keeping as many datasets as possible for
the zero-shot evaluation of Chronos models, among the ones most commonly used in the literature, while
still having enough variety of domains and sampling frequencies in the training data. Overall, we used 28
datasets for training Chronos models, consisting of about 890K univariate time series with approximately
84B observations (tokens) in total. For both in-domain (I) and zero-shot (II) benchmark datasets, we used
the last H ∈ N+ observations of each time series as a held-out test set: all models are judged by the accuracy
of their forecast on such held-out set, which no model had access to for training purposes. The prediction

8
length H is task-specific (see Table 2 in Appendix B), where we define a task as a dataset and prediction
length pair. Tasks in both benchmarks exhibit diverse properties, in terms of the dataset size, frequency,
history length, and prediction length, making them rich benchmarks reflective of real world scenarios.

5.2 Training Corpus and Protocols

We selected T5 (Raffel et al., 2020) as the main architecture for Chronos in our experiments, since it is
available in a variety of sizes, ranging from 16M (Tiny) to 11B (XXL) parameters (Tay et al., 2021). We
also conducted experiments with the decoder-only GPT-2 model to demonstrate the applicability of the
Chronos framework to decoder-only models. In the following, we discuss the training configurations used
for our main results (Section 5.5) and explore alternatives for some of the hyperparameters in Section 5.6.
We trained T5 models of 4 sizes,1 namely, Mini (20M), Small (46M), Base (200M) and Large (710M), and the
GPT-2 base model (90M), on 10M TSMix augmentations (see Section 4.1) generated from the 28 training
datasets, with K = 3 in Algorithm 1, and 1M synthetic time series generated using Gaussian processes
(see Section 4.2). Note that with this setup, original time series are adequately represented since they are
included in the TSMix augmentations with probability 1/3. We sampled time series from the augmentations
and synthetic data in the ratio 9:1 during training. Each model is trained with an effective batch size of
256 sequences, using distributed data parallelism and gradient accumulation, whenever necessary. These
sequences are constructed by slicing random windows from the time series, and then scaling and quantizing
them into equal-sized bins within the interval [l= − 15, r=15], as described in Section 3.1. The context
length of the sequences was set to 512, the default for T5 models, and the prediction length is set to 64, a
value greater than the prediction lengths of all tasks we consider in our evaluation.
The models were optimized for 200K steps using the AdamW optimizer with a weight decay of 0.01. The
learning rate was annealed linearly from its initial value of 0.001 to 0 over the training steps. The other model
and training hyperparameters were set to their defaults used in the transformers library (Wolf et al., 2020).
We used an AWS EC2 instance with 8 A100 (40GB) GPUs to train all Chronos models, and we employed
faster floating point formats (TF32) and model compilation to speed up training. Table 5 in Appendix E
reports the training time and the approximate cost of training Chronos models of different sizes.

5.3 Baselines

We assessed the performance of Chronos models against a variety of time series forecasting baselines.
From statistical forecasting literature (Hyndman & Athanasopoulos, 2018), we included Naive, Seasonal
Naive, AutoETS, AutoARIMA (Hyndman et al., 2008) and AutoTheta (Assimakopoulos & Nikolopoulos,
2000). Additionally, we compared against several neural forecasting baselines, including WaveNet (Oord
et al., 2016), DeepAR (Salinas et al., 2020), N-BEATS (Oreshkin et al., 2020), TFT (Lim et al., 2021),
DLinear (Zeng et al., 2023), PatchTST (Nie et al., 2023), N-HiTS (Challu et al., 2023), and GPT4TS (Zhou
et al., 2023a). On Benchmark II (i.e., zero-shot datasets for Chronos models), we also evaluated against
ForecastPFN (Dooley et al., 2023) which is a pretrained transformer model trained only on synthetic time
series data.
We categorize Chronos models and the baselines into three groups: local models that estimate parameters
for each time series individually; task-specific models trained or fine-tuned for each task separately; and
pretrained models which do not perform task-specific training, instead using a single model across all tasks.
Further details on the implementation and training of these baselines can be found in Appendix C.

5.4 Evaluation Metrics

Whenever possible,2 we evaluated models both in terms of their probabilistic and point forecast performance.
We used the weighted quantile loss (WQL) to assess the quality of the probabilistic forecasts: the WQL is
related to the continuous ranked probability score (CRPS, Gneiting & Raftery (2007))3 and is commonly
1 Ourinference code and model checkpoints are available at https://github.com/amazon-science/chronos-forecasting.
2 Some models (GPT4TS and ForecastPFN) only generate point forecasts and we only evaluate those.
3 Many existing works (Ansari et al., 2021; Rasul et al., 2023; Kollovieh et al., 2023) use CRPS and WQL synonymously.

9
used to evaluate probabilistic forecasts (Gasthaus et al., 2019; Shchur et al., 2023). The WQL measures
the compatibility between the predictive distribution and the ground-truth observation at a uniformly-
spaced grid of quantile levels; we compute the WQL on 9 uniformly-spaced quantile levels {0.1, 0.2, . . . , 0.9}.
Quantile forecasters such as TFT were directly trained on these quantile levels. For methods requiring
sampling, we estimated the quantiles using 20 sample forecast paths. We used the mean absolute scaled
error (MASE, Hyndman & Koehler (2006)) to evaluate the point forecast performance. The MASE is defined
as the absolute error of the forecast scaled by the historical seasonal error of the time series, and was selected
due to its favorable properties over other point forecasting metrics (Hyndman & Koehler, 2006). We used the
median forecast (0.5-quantile) for computing the MASE for the probabilistic forecasters. See Appendix D
for a detailed discussion on the evaluation metrics.
Since the magnitude of the evaluation metrics can vary across datasets, we adopt a different approach to
aggregate scores than naive averaging. For each dataset, we compute the relative score of each model as
the model’s score divided by the score of a baseline model (here, Seasonal Naive). The relative scores are
aggregated across all datasets using the geometric mean. The choice of the geometric mean is deliberate —
Fleming & Wallace (1986) show that the arithmetic mean can yield misleading conclusions in this context,
and the geometric mean is provably the only meaningful way to aggregate such relative scores. Furthermore,
the geometric mean is also not sensitive to the choice of the baseline, and the model ordering stays intact
if another baseline is selected instead. We used Seasonal Naive due to its simplicity and popularity as a
forecasting baseline. For models that failed or could not finish evaluation within the allotted time on certain
datasets, we use a relative score of 1, i.e., the baseline relative score, when aggregating the results. We
assign equal weights to all tasks during aggregation, reflecting real-world scenarios where datasets may have
different numbers of time series, frequencies, history and prediction lengths.

5.5 Main Results

In this section, we present our main results on 42 datasets, which comprise Benchmark I (15 datasets)
and Benchmark II (27 datasets). Chronos models surpass both classical statistical baselines and task-
specific deep learning models on the in-domain datasets (Benchmark I; see Section 5.5.1). On the zero-shot
datasets (Benchmark II; Section 5.5.2), Chronos models comfortably outperform statistical baselines, while
performing on par with the best deep learning models trained on these tasks. With an inexpensive fine-tuning
regimen, our Chronos-T5 (Small) model achieves the top spot on Benchmark II, significantly outperforming
all baselines.

5.5.1 Benchmark I: In-domain Results

Benchmark I comprises 15 datasets that were also part of the training data of Chronos models, i.e., this
benchmark evaluates the in-domain performance of Chronos models (see Table 2). Figure 4 summarizes
the probabilistic and point forecasting performance for all models on the held-out test windows, in terms
of their aggregated relative scores, computed as described in Section 5.4. The bigger Chronos-T5 models
(Base and Large) significantly outperform baseline models, obtaining the best aggregated relative scores and
average ranks (Figure 18 in Appendix E). These models not only perform better than local models (e.g.,
AutoETS and AutoARIMA), but they also perform better than task-specific deep learning models trained
or fine-tuned for each dataset (e.g., PatchTST and DeepAR).
The smaller Chronos-T5 models (Mini and Small) and Chronos-GPT2 also perform better than the
majority of baselines, with the exception of PatchTST. Task-specific deep learning models, trained across
multiple time series for a specific task, perform better than local statistical models that fit parameters for
each time series. Interestingly, the Seasonal Naive baseline performs competitively against other local models
on this benchmark, suggesting that the datasets in this benchmark exhibit strong seasonal patterns. This
is unsurprising since a majority of these datasets belong to domains such as energy and transport that tend
to be highly seasonal in nature. The raw WQL and MASE values for individual datasets summarized in
Figure 4 can be found in Tables 6 and 7 in Appendix E.
These results demonstrate the benefit of using models that are trained only once across multiple datasets, over
task-specific models trained individually for each task. Such models could streamline production forecasting

10
0SGEP1SHIPW 8EWO7TIGMJMG1SHIPW 4VIXVEMRIH1SHIPW -R(SQEMR

'LVSRSW8 0EVKI  'LVSRSW8 0EVKI 


'LVSRSW8 &EWI  'LVSRSW8 &EWI 
4EXGL878  4EXGL878 
'LVSRSW8 1MRM  'LVSRSW8 7QEPP 
 'LVSRSW8 1MRM 
'LVSRSW8 7QEPP
 'LVSRSW+48 
'LVSRSW+48
(IIT%6 
2,M87 
;EZI2IX 
2&)%87 
1SHIP

1SHIP
2,M87 
(IIT%6 
2&)%87 
;EZI2IX 
(0MRIEV 
(0MRIEV 
+4887 
8*8  
8*8
%YXS%6-1%  %YXS%6-1% 
7IEWSREP2EMZI  %YXS)87 
%YXS)87  7IEWSREP2EMZI 
%YXS8LIXE  %YXS8LIXE 
2EMZI  2EMZI 
                 
%KK6IPEXMZI;50 %KK6IPEXMZI1%7)

Figure 4: Performance of different models on Benchmark I, comprising 15 datasets also included in the training
data of Chronos models. This benchmark showcases the in-domain performance of Chronos models against local
statistical models, which fit parameters individually for each time series, and task-specific models that train a separate
model for each task. The probabilistic (WQL) and point (MASE) forecasting metrics are normalized using the scores
of the Seasonal Naive baseline and aggregated through a geometric mean to obtain the aggregated relative WQL
and MASE, respectively. Results for Chronos and task-specific models (except GPT4TS) have been averaged over
3 random seeds. Models producing point-forecasts (GPT4TS) are only compared based on MASE.

systems, where forecasts from different time series tasks are required, by obviating the need for training
separate models for each task.

5.5.2 Benchmark II: Zero-shot Results

Benchmark II consists of 27 datasets that were not used during Chronos models’ training (see Table 2
in appendix B), i.e., this benchmark evaluates the zero-shot performance of these models. These datasets
belong to diverse domains and frequencies, some of which are not even part of the training data, making
this a challenging benchmark for Chronos.4 Figure 5 summarizes the results on Benchmark II in terms of
the aggregated relative scores. This benchmark is clearly more challenging than Benchmark I (Figure 4), as
the best models tend to offer lower improvements relative to the baseline.
Nevertheless, despite never having seen these datasets during training, Chronos models significantly out-
perform local statistical models. On probabilistic forecasting (aggregate relative WQL), Chronos models
achieve the 2nd and 3rd spots, performing better than most task-specific models that have been trained on
these tasks. Chronos-T5 (Large) places 3rd in terms of the point forecasting performance, narrowly losing
the 2nd spot to N-HiTS. Chronos models also significantly outperform ForecastPFN, a recently proposed
zero-shot forecaster, and even GPT4TS, which fine-tunes a pretrained GPT-2 model on each dataset. The
raw WQL and MASE values for individual datasets summarized in Figure 5 can be found in Tables 8 and 9
in Appendix E.
The results on this benchmark highlight the promise of Chronos as a generalist time series forecaster —
it performs significantly better than local models that are commonly used in a zero-shot setting, and it
performs on par with the best task-specific deep learning models.

Fine tuning. Motivated by the remarkable zero-shot performance of Chronos models, we conducted a
preliminary investigation into fine-tuning Chronos models individually on datasets from Benchmark II.

4 From a rigorous standpoint, to prevent information leakage, the start time of any dataset within this category must be

after the time stamp of the last observation from the pretraining dataset and Benchmark I. Nevertheless, we consider the risk
to be minimal given that the datsets bear no overlap beyond high-level conceptual categorization.

11
0SGEP1SHIPW 8EWO7TIGMJMG1SHIPW 4VIXVEMRIH1SHIPW >IVS7LSX

8*8  4EXGL878 


'LVSRSW8 0EVKI  2,M87 
 'LVSRSW8 0EVKI 
'LVSRSW8 &EWI
2&)%87 
2,M87 
(IIT%6 
'LVSRSW8 7QEPP 
'LVSRSW8 &EWI 
2&)%87 
8*8 
4EXGL878  'LVSRSW8 7QEPP 
'LVSRSW8 1MRM  'LVSRSW8 1MRM 
1SHIP

1SHIP
'LVSRSW+48  'LVSRSW+48 
(IIT%6  %YXS8LIXE 
(0MRIEV  (0MRIEV 
+4887 
%YXS%6-1% 
%YXS%6-1% 
%YXS8LIXE 
;EZI2IX 
%YXS)87 
%YXS)87 
;EZI2IX  
7IEWSREP2EMZI
7IEWSREP2EMZI  2EMZI 
2EMZI  *SVIGEWX4*2 
            
%KK6IPEXMZI;50 %KK6IPEXMZI1%7)

Figure 5: Performance of different models on Benchmark II, comprising 27 datasets not seen by Chronos models
during training. This benchmark provides insights into the zero-shot performance of Chronos models against local
statistical models, which fit parameters individually for each time series, task-specific models trained on each task, and
the pretrained ForecastPFN model. The probabilistic (WQL) and point (MASE) forecasting metrics were normalized
using the scores of the Seasonal Naive baseline and aggregated through a geometric mean to obtain the aggregated
relative WQL and MASE, respectively. Results for Chronos and task-specific models (except GPT4TS) have been
averaged over 3 random seeds. Models producing point-forecasts (GPT4TS and ForecastPFN) are only compared
based on MASE.

We selected the Chronos-T5 (Small) model for this experi-


ment due to its good zero-shot performance with a relatively 'LVSRSW8 7QEPP
>IVS7LSX *MRI8YRIH
low training cost. We fine-tuned the model in a dataset-
&IRGLQEVO--

agnostic fashion with an initial learning rate of 0.001, annealed 


;50
linearly to 0 over 1000 steps. Figure 6 shows that fine-tuning 
significantly improves the aggregate performance of the model 
1%7)
on Benchmark II. The fine-tuned Chronos-T5 (Small) model 
now takes the top spot on Benchmark II overall, overtaking     
both larger (zero shot) Chronos models and the best task- %KK6IPEXMZI7GSVI
specific models. Notably, Chronos-T5 (Small) is not even the
most accurate variant of Chronos on Benchmark II in the Figure 6: When fine-tuned on individual
zero shot setting, suggesting that further improvements may datasets from Benchmark II, Chronos-T5
be obtained by fine-tuning larger Chronos-T5 variants. (Small) significantly improves over the zero-
shot performance and becomes the best per-
forming model on average (see Figure 5).
5.6 Analysis of Hyperparameters

Here, we explore the effect of different design choices on the downstream model performance, beginning
with a comparison of different model sizes and initializations. We then analyze the effect of training steps,
synthetic data proportion, context length, and vocabulary size, on the performance of Chronos-T5 (Small).
We only vary the parameter of interest, keeping everything else fixed to the value used in the main results.

Model size. We experimented with four model sizes ranging from 20M to 710M parameters.5 Unsurpris-
ingly, the training loss improves with the model capacity, as shown in Figure 7a. We also observe this trend
in the downstream model performance — it improves with the model size for both in-domain and zero-shot
benchmarks, as shown in Figure 7b. These trends suggest that even larger models may improve performance
further. However, we did not explore larger models due to slow inference times which would render them
impractical for real-world applications.

5 These numbers differ from the original sizes of the T5 models in Tay et al. (2021) due to the change in the vocabulary size.

12
&IRGLQEVO 1IXVMG
-R(SQEMR ;50 -R(SQEMR 1%7)
>IVS7LSX ;50 >IVS7LSX 1%7)
 'LVSRSW8 1MRM 
'LVSRSW8 7QEPP 


%KK6IPEXMZI7GSVI
8VEMRMRK0SWW 'LVSRSW8 &EWI 
 'LVSRSW8 0EVKI

 





 
 / / / / 1 1 1 1
8VEMRMRK7XIT 1SHIP7M^I
(a) (b)

Figure 7: Model size. (a) Training loss curves of Chronos models of different sizes. (b) In-domain and zero-shot
performance of Chronos models varying over model size.


'LVSRSW8 1MRM 'LVSRSW8 7QEPP 'LVSRSW8 &EWI 'LVSRSW8 0EVKI
-RMXMEPM^EXMSR -RMXMEPM^EXMSR -RMXMEPM^EXMSR -RMXMEPM^EXMSR
8VEMRMRK0SWW

6ERHSQ 6ERHSQ 6ERHSQ 6ERHSQ


 0ERKYEKI1SHIP 0ERKYEKI1SHIP 0ERKYEKI1SHIP 0ERKYEKI1SHIP



 / / / /  / / / /  / / / /  / / / /
8VEMRMRK7XIT 8VEMRMRK7XIT 8VEMRMRK7XIT 8VEMRMRK7XIT

Figure 9: Initialization. Comparison of training loss of randomly-initialized Chronos models of different sizes against
those initialized with language model weights.

&IRGLQEVO 1IXVMG
-R(SQEMR ;50 -R(SQEMR 1%7)
Initialization. We investigated whether initializing >IVS7LSX ;50 >IVS7LSX 1%7)
Chronos models to the corresponding T5 language models 
6ERHSQ-RMX 0ERKYEKI1SHIP-RMX
pretrained by Tay et al. (2021) on the C4 dataset (Raffel 
%KK6IPEXMZI7GSVI

et al., 2020) has any impact on the training dynamics or the 
downstream performance. Figure 9 shows the training loss 
curve for models initialized randomly and those initialized

with language model weights. Notably, models initialized

randomly tend to converge to a lower training loss compared
to their counterparts initialized with language model weights. 
For the larger models (Base and Large), models initialized 
with language model weights initially exhibit a faster decrease 
1 1 1 1
in training loss, but they ultimately converge to a higher final
1SHIP7M^I
loss.
Figure 8: Comparison of the in-domain and
Overall, these observations suggest that language model zero-shot performance of models initialized
weights are not particularly remarkable in the context of time with language model weights (marked as star)
series forecasting and offer no improvement over random initial- and three randomly initialized models (marked
ization. These conclusions are further reinforced by Figure 8 as circles) across different model sizes.
which shows the downstream performance of models initialized
with language model weights against three randomly-initialized models of each size. Across all model sizes,
the performance of models initialized with language model weights either overlaps with or slightly under-
performs compared to randomly initialized models. These results suggest that LLM initialization offers
relatively little advantage in the context of time series forecasting, and instead random initialization may be
the preferable choice.

13
&IRGLQEVO 1IXVMG
2S871M\SV7]RXL 871M\3RP]
-R(SQEMR ;50 -R(SQEMR 1%7)
 871M\ 7]RXL
>IVS7LSX ;50 >IVS7LSX 1%7)

-R(SQEMR 
;50  

%KK6IPEXMZI7GSVI
 

&IRGLQEVO
>IVS7LSX 
;50  
 
-R(SQEMR 
1%7)  
 
>IVS7LSX 
1%7)  
           
%KK6IPEXMZI7GSVI SJ7]RXLIXMG(EXE
(a) (b)

Figure 10: (a) Comparison of in-domain and zero-shot performance of Chronos-T5 (Small) models trained with
and without TSMix augmentations. (b) In-domain and zero-shot performance of Chronos-T5 (Small) models with
varying proportion of KernelSynth data in the training corpus.

TSMix augmentations. As described in Section 5.2, we trained Chronos models on TSMix augmen-
tations rather than directly on the original time series. In this experiment, we investigate whether using
TSMix augmentations is advantageous for downstream performance. Figure 10a compares the performance
of Chronos-T5 (Small, 46M) models trained with and without TSMix augmentations. The model trained
on TSMix augmentations obtains similar in-domain performance to the model trained without augmenta-
tions. However, the zero-shot performance improves when using TSMix augmentations. This suggests that
TSMix enchances the diversity of training data which leads to improved performance on unseen datasets.
Figure 10a also shows that the zero-shot performance obtains an additional boost with the inclusion of
synthetic data. We investigate this further in the next experiment.

Synthetic data proportion. We systematically explored the impact of KernelSynth on downstream


model performance. We trained Chronos-T5 (Small, 46M) models with time series sampled from TSMix
augmentations and KernelSynth data in different ratios, ranging from 0% (i.e., trained solely on TSMix
augmentations) to 100% synthetic data.
Figure 10b shows the performance of models trained with different proportions of synthetic data. Both
in-domain and zero-shot metrics improve with the incorporation of synthetic data in training. The most
consistent improvement is observed around the 10% synthetic data proportion. Further increasing the
proportion of synthetic data tends to worsen performance. This is unsurprising since the synthetic data
generated using Gaussian processes is not representative of all real-world time series.
While the model trained only on synthetic data performs worse relative to models with real data in their
training corpus, it performs reasonably well in terms of its absolute performance. Figure 20 (Appendix E)
shows that it performs significantly better than ForecastPFN (Dooley et al., 2023), another model that is
trained solely on synthetic data (generated differently from KernelSynth). Surprisingly, it also outperforms
several other baselines in our benchmarks,6 despite never having seen real data during training. These results
attest the quality of our synthetic data, and they open up directions for future work to close the performance
gap further.

Training steps. We trained a Chronos-T5 (Small, 46M) for 1M training steps to study the effect of
longer training on model performance. Figure 11a shows that the downstream model performance improves
over the course of training, both on in-domain and zero-shot benchmarks. This suggests that performance
of the larger models (Base and Large) can potentially be improved by training them for longer.

Context length. We studied the effect of the context length on downstream performance by training
Chronos-T5 (Small, 46M) models with four distinct context lengths. Figure 11b shows how the performance
varies with increasing context length. We observe improvements on both in-domain and zero-shot metrics

6 All benchmarks are zero-shot for this model, since it was only trained on synthetic data.

14
&IRGLQEVO 1IXVMG &IRGLQEVO 1IXVMG &IRGLQEVO 1IXVMG
-R(SQEMR ;50 -R(SQEMR 1%7) -R(SQEMR ;50 -R(SQEMR 1%7) -R(SQEMR ;50 -R(SQEMR 1%7)
>IVS7LSX ;50 >IVS7LSX 1%7) >IVS7LSX ;50 >IVS7LSX 1%7) >IVS7LSX ;50 >IVS7LSX 1%7)
 

 

%KK6IPEXMZI7GSVI
%KK6IPEXMZI7GSVI
%KK6IPEXMZI7GSVI


 

 

 

 


 
 / / / / 1        
8VEMRMRK7XIT 'SRXI\X0IRKXL :SGEFYPEV]7M^I
(a) (b) (c)

Figure 11: In-domain and zero-shot performance of a Chronos-T5 (Small) models varying over (a) the number of
training steps, (b) the training context length, and (c) the vocabulary size.

as context length increases, showing that a longer context helps the models to forecast better. However,
this analysis may be limited due to our zero-shot evaluation setup, wherein the majority of datasets in
the benchmark have low frequencies and time series shorter than 1000 steps. Hence, further evaluation is
required to conclusively study the impact of longer context lengths. We posit that high-frequency datasets
may benefit from a longer context, which may be necessary to correctly capture the long-term seasonal
patterns.

Vocabulary size. The vocabulary size governs the precision with which the model can process the scaled
time series. To explore its impact on performance, we trained Chronos-T5 (Small, 46M) models with varying
vocabulary sizes. Figure 11c shows consistent improvements in the point forecasting metric (MASE) as the
vocabulary size increases. In contrast, the WQL initially improves but deteriorates for larger vocabulary
sizes. We hypothesize that this behavior is an artifact of the chosen metrics. The MASE, which is invariant
to the scale of individual series, is closely aligned to our training loss, which is also invariant to scale.
Hence, MASE exhibits an improvement with increased precision, just as one expects for the training loss.
Conversely, WQL, a scale-dependent metric, does not correlate closely with the training loss and behaves
less predictably as precision increases. See Appendix D for a discussion on the properties of these metrics.

5.7 Qualitative Analysis and Limitations

In this section, we analyze forecasts generated by Chronos models qualitatively, and we also highlight some
limitations of our tokenization technique. We primarily focus on synthetically generated time series for a
controlled analysis of different types of time series patterns. For example forecasts from real datasets, see
Figures 22 to 24 in Appendix E.

I.I.D. Noise. We generated time series comprised purely of Gaussian observations, N (0, 1) and N (100, 10),
and used Chronos-T5 (Base) to forecast these. Figure 12a shows that Chronos generates plausible fore-
casts for such time series and the predicted 80% interval coincides with the ground truth 80% interval shown
by the dashed blue lines.

Trend and seasonality. We generated time series following linear and exponential trends: Chronos-T5
(Base) predicts the linear trend accurately but struggles with the exponential trend, as shown in Figure 12b.
This may be due to a limited representation of exponential trends in the training data. A potential res-
olution for generating better forecasts for time series with exponential trends is to perform logarithmic
scaling before feeding the time series into Chronos models. We also observed that Chronos models
tend to underestimate the trend when the context is not sufficiently long. This phenomenon is depicted
in Figure 13 where the model forecasts the pattern correctly but underpredicts the trend when a short
context is provided. However, with a longer context, the model picks up the correct pattern and trend.

15
,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ ,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ
 

 

 
             




 
             

(a) (b)

,WTZSI9WZYM 2JINFS+TWJHFXY 


.SYJW[FQ ,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ




 
             
 
 

 
             

(c) (d)

Figure 12: Forecasts generated by Chronos-T5 (Base) on synthetically generated patterns. (a) Noise: Chronos
generates reasonable forecasts for Gaussian noise with the 80% prediction interval matching the interval of the
underlying distribution (shown by the horizontal dashed blue line). (b) Trend: Chronos forecasts a linear trend
(top) correctly but struggles with an exponential trend (bottom). (c) Seasonality: Chronos accurately models
seasonal patterns of varying degrees of complexity (single seasonality at the top and three seasonalities at the bottom).
(d) Combined Patterns: Chronos forecasts time series generated by the additive (top) or multiplicative (bottom)
combination of trend and seasonal patterns accurately.

,WTZSI9WZYM 2JINFS+TWJHFXY 


.SYJW[FQ
In our analysis, we observed that Chronos models recognize 

seasonal patterns in time series particularly well. We gener- 


ated purely seasonal time series using sinusoids with different 
      
frequencies. As shown in Figure 12c, Chronos-T5 (Base) pre-

cisely forecasts both time series. When fundamental patterns

such as trend and seasonality are combined, either additively 
or multiplicatively, Chronos forecasts them accurately. This
      
is demonstrated in Figure 12d on time series generated via ad-
dition and multiplication of a linear function with a sinusoid. Figure 13: When the context is not sufficiently
long, Chronos-T5 (Base) tends to underes-
Autoregressive processes. An autoregressive (AR) process timate trend, as shown in this example with
of order p is defined as the classic Air Passengers data (monthly) and
a forecast horizon of 24. Top: with only 120
p
X observations as context, the median prediction
Xt = φi Xt−i + εt , plateaus compared to the previous trend. Bot-
i=1 tom: with the full context of 144 observations,
the prediction picks up the trend more closely.
where εt ∼ N (0, 1) and φ1 , . . . , φp are the parameters of the
model. We generated time series from stationary AR processes of different orders ranging from 1 to 4, and
we compared the forecasts generated by Chronos-T5 (Base) against those generated by three models: (a)
the ground truth AR model that was used to generate the time series; (b) an AR model with the correct
order (p) fitted to the time series; and (c) an AutoARIMA model fitted to the time series. Figure 14 shows
the results for the AR(1) and AR(4) processes, and Figure 21 (Appendix E) shows the results for AR(2) and

16
,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ ,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ
,WTZSI9WZYM&7 ,WTZSI9WZYM&7
28* 28*
 
 
 
         
&7 \NYMHTWWJHYTWIJW &7 \NYMHTWWJHYTWIJW
28* 28*
 
 
 
         
&ZYT&7.2& &ZYT&7.2&
28* 28*
 
 
 
         
(MWTSTX9 'FXJ (MWTSTX9 'FXJ
28* 28*
 
 
 
         

(a) AR(1) (b) AR(4)

Figure 14: Forecasts generated by Chronos-T5 (Base) for time series generated from AR(1) and AR(4) processes
compared against forecasts generated by the ground truth AR model, a fitted AR model of the correct order, and
an AutoARIMA model. Chronos-T5 (Base) generates plausible forecasts and prediction intervals in both cases. All
AR models fit the simpler AR(1) process correctly and obtain better MSE than Chronos-T5 (Base); however, with
the increased complexity in the AR(4) process, Chronos-T5 (Base) performs second best after the ground truth AR
model.

AR(3). We observe that Chronos-T5 (Base) generates plausible forecasts across all four AR processes. The
simpler AR(1) and AR(2) processes are easier for the correctly-specified AR model and AutoARIMA model
to fit, resulting in a better MSE than Chronos-T5 (Base). However, with increasing complexity in AR(3)
and AR(4) processes, Chronos-T5 (Base) not only outperforms the AutoARIMA model (which belongs the
same family as the ground truth model) but also performs slightly better than the fitted AR model with
correct order. These results highlight that Chronos models can recognize fundamental patterns present in
time series data.

Flexible predictive distributions. Using a categorical distribution to encode predictions gives Chronos
flexibility in producing predictive distributions of different shapes. This is shown in Figure 15, illustrating
kernel density estimate (KDE) plots of token IDs sampled from a Chronos model, for the first five time
steps in the forecast horizon, across three datasets. Despite the fact that cross-entropy is not distance-
aware, Chronos outputs predictive distributions over a contiguous set of tokens, and with different shapes,
including multi-modal ones.

Overflow and loss of precision. One limitation of Chronos comes from the proposed tokenization
approach (see Section 3.1). Specifically, the tokens we select represent values in the range [−15s, 15s], where
s is the scale of the data (mean absolute value). If s is very small compared to the range of values in the
series, then some observations will fall out of the representable range. An example of this behaviour is with
sparse series, and as shown in Figure 16a. On the other hand, very large values of s compared to the variance
result in loss of precision: in the original space, tokens are spaced 30s/(B − 1) from each other, where B
is the number of bins (we used B = 4094 in our experiments); values closer than that to each other may
be mapped to the same token, with an apparent loss of precision. An example of this behaviour is given
in Figure 16b. Improving the tokenization to overcome these edge cases is subject for future work, but the

17
h=1 h=1 h=1
h=2 h=2 h=2
h=3 h=3 h=3
h=4 h=4 h=4
h=5 h=5 h=5
2150 2200 2250 2300 2350 2050 2060 2070 2080 2090 2000 2100 2200 2300 2400
Token ID Token ID Token ID
(a) NN5 (b) Traffic (c) Hospital

Figure 15: Forecast distributions from a Chronos model on series from the NN5 (Daily), Traffic, and Hospital datasets
respectively. Each plot shows the predictive distribution for five prediction steps (h = 1, . . . , 5): the densities were
obtained via kernel density estimation from sample forecasts. Even though the cross entropy is not distance-aware,
the model learns to estimate distributions over neighboring tokens, and of diverse shapes, including multimodal ones.

,WTZSI9WZYM 2JINFS+TWJHFXY 


.SYJW[FQ ,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ
 

 

 
           
 

 

 
           


 

 
           

(a) (b)

Figure 16: Loss of precision due to scaling and quantization. In (a), data consists of unit spikes every n = 10, 20, 50
observations (top to bottom): the scale here is 1/n, hence the maximum representable value is 15/n. When 1 > 15/n
then the model cannot possibly capture the spikes appropriately (all but the top case), since their value is not
represented accurately by tokens. In (b), data is a sine wave shifted up by µ = 1, 10, 50: the scale here is µ, and as
the variance of the signal becomes smaller and smaller relative to µ, the tokens precision decreases.

results from Section 5.5 suggest that the Chronos models performs well on real-world data despite the
limitations.

6 Discussion

Chronos represents one of the first endeavours in practical pretrained time series forecasting models, with
remarkable zero-shot performance on a comprehensive collection of test datasets. This work opens up various
research avenues, some of which we discuss below.

6.1 Beyond Zero-shot Univariate Forecasting

In our experiments, we evaluated Chronos in a zero-shot manner for most datasets. Such a setup
highlights the competitiveness of zero-shot Chronos models against task-specific baselines. We expect that
both in-domain and zero-shot results could be enhanced further through fine-tuning, an avenue we briefly
explored in Section 5.5.2. This can be done using any parameter-efficient fine-tuning methods such as those
based on low-rank adapters (LoRA) (Hu et al., 2021; Zhang et al., 2023). Alternatively, Chronos can be
calibrated for a specific task with conformal methods (Romano et al., 2019; Stankeviciute et al., 2021; Xu

18
& Xie, 2021). Chronos is especially attractive in the context of conformal prediction since it requires no
training set, so all available data can be used for calibration.
In this work, we have focused on univariate time series forecasting since it constitutes the most common of
real-world time series use-cases. Nevertheless, practical forecasting tasks often involve additional information
that must be taken into account. One example involves covariates, that can be either time-independent (e.g.,
color of the product) or time-varying (e.g., on which days the product is on sale). Another closely related
problem is multivariate forecasting, where historic values of one time series (e.g., interest rates) can influence
the forecast for another time series (e.g., housing prices). The number of covariates or multivariate dimensions
can vary greatly across tasks, which makes it challenging to train a single model that can handle all possible
combinations. A possible solution may involve training task-specific adaptors that inject the covariates into
the pretrained forecasting model (Rahman et al., 2020). As another option, we can build stacking ensembles
(Ting & Witten, 1997) of Chronos and other light-weight models that excel at handling covariates such as
LightGBM (Ke et al., 2017).
Thus far, our exploration has centered on the problem of time series forecasting. However, several other
time series analysis tasks, such as classification, clustering, and anomaly detection (Dau et al., 2018; Wu &
Keogh, 2021; Ismail Fawaz et al., 2019; Goswami et al., 2024), could potentially benefit from a pretrained
model like Chronos. We hypothesize that the representations learned by the encoders of Chronos-T5
models are universal and can be used for these tasks. An exploration of Chronos-T5 representations for
various downstream tasks would constitute interesting future work.

6.2 Inference

A potential limitation of the larger Chronos mod- 1SHIP8]TI


2EMZI
els is their inference speed compared to task-specific 7IEWSREP2EMZI 0SGEP1SHIPW
(0MRIEV 8EWO7TIGMJMG1SHIPW
deep learning models. Figure 17 illustrates the 4EXGL878 4VIXVEMRIH1SHIPW >IVS7LSX

inference time of generating forecasts for a single 8*8


(IIT%6
'SQTYXI8]TI
'49
time series, averaged across datasets. The inference ;EZI2IX
2,M87
+49 \:
1SHIP

speed of the larger Chronos models is comparable 2&)%87


+4887
to some statistical local models. Moreover, while 'LVSRSW8 1MRM
*SVIGEWX4*2
Chronos models are slower than task-specific mod- 'LVSRSW8 7QEPP
%YXS)87
els, they are not too large to be prohibitively slow. 'LVSRSW8 &EWI
'LVSRSW+48
Furthermore, task-specific models need to be trained 'LVSRSW8 0EVKI
for each task individually, which requires additional %YXS8LIXE
%YXS%6-1%
time and compute. In contrast, Chronos models    
   
can be deployed for datasets with diverse history %ZK-RJIVIRGI8MQI QW
lengths, frequencies, prediction horizons, and con-
text lengths. This makes model deployment signif- Figure 17: Inference time of different models for forecasting
icantly easier and drastically simplifies forecasting a single time series, averaged across datasets. The compute
pipelines, obviating the need for task-specific train- requirements of individual models have been highlighted.
ing.
By leveraging a language modeling framework for time series, we make developments in the NLP commu-
nity immediately transferable to Chronos models. For instance, inference speed can be improved by using
CUDA kernels optimized for modern Ampere GPUs, quantization (Dettmers et al., 2022), and faster de-
coding techniques, including speculative (Leviathan et al., 2023) and lookahead (Fu et al., 2023) decoding.
Developments in long-context language models (Sun et al., 2022; Dao, 2023) may help improve Chronos
models’ applicability to high-frequency datasets that require longer contexts to capture seasonal patterns.
Other techniques popularly used for text language models, such as temperature tuning, beam search (Freitag
& Al-Onaizan, 2017), Top-K sampling (Fan et al., 2018), nucleus sampling (Holtzman et al., 2019), could
enhance the quality of forecasts. These may particularly be helpful in improving the speed and quality of
point forecasts, which currently require aggregation over multiple samples.

19
6.3 Data

Our findings underscore that training larger models on a large corpus of time series data yields excellent
in-domain and zero-shot performance. Nevertheless, in contrast to NLP, high-quality public time series
data remains limited. This poses a dilemma when training models on a large corpus of diverse datasets —
selecting more datasets for training leaves fewer for zero-shot evaluation. The time series community would
benefit greatly from the availability of larger time series datasets that could be used to develop and improve
pretrained model such as Chronos. There have been some recent efforts on building large-scale time series
datasets (Emami et al., 2023; Liu et al., 2023) for specific domains, albeit further investment is needed.
Another direction to address the problem of limited data involves developing better methods for generating
synthetic time series. Our work has made significant strides in this direction by clearly demonstrating the
utility of synthetic data generated using Gaussian processes, improving model performance when incorpo-
rated into the training data. Even models trained solely on synthetic data exhibit reasonable forecasting
performance. Future research could delve into the failure modes of these models, proposing enhancements
to bridge the gap between real and synthetic data.

7 Conclusion

In this work, we approach the problem of developing generalist pretrained forecasting models from the
lens of a minimalist. We adapt existing language model architectures and training procedures for time
series forecasting, challenging the notion that time-series-specific features or architectures are necessary for
forecasting. This results in Chronos, a language modeling framework for time series that is, paradoxically,
agnostic to time. The defining characteristic of Chronos is its compatibility with any language model
architecture, only requiring minimal modifications — tokenization though scaling and quantization. Our
pretrained models significantly outperform existing local models and task-specific deep learning baselines in
terms of their in-domain performance. More remarkably, Chronos models obtain excellent results on unseen
datasets (zero-shot performance), performing competitively with the best deep-learning baselines trained on
these datasets, while showing promising evidence of further improvements through fine-tuning.
Our contributions are significant in two key aspects. First, we show that existing language model architec-
tures are capable of performing forecasting without time-series-specific customizations. This paves the way
for accelerated progress by leveraging developments in the area of LLMs and through better data strategies.
Second, on a practical level, the strong performance of Chronos models suggests that large (by forecasting
standards) pretrained language models can greatly simplify forecasting pipelines without sacrificing accuracy,
offering an inference-only alternative to the conventional approach involving training and tuning a model on
individual tasks.

Acknowledgements

We are grateful to our fellow researchers who have contributed to this work with insightful discussions
and valuable feedback, including but not limited to Laurent Callot, Baris Kurt, Valentin Flunkert, David
Salinas, Boran Han, Xiaoyong Jin, Luke Huan, Youngsuk Park, Gaurav Gupta, Karthick Gopalswamy, Tim
Januschowski, Jan Gasthaus, Bing Xiang, Kashif Rasul, Mononito Goswami and Gerald Woo.

References
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus,
Tim Januschowski, Danielle C Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, et al. GluonTS:
Probabilistic and Neural Time Series Modeling in Python. The Journal of Machine Learning Research,
21(1):4629–4634, 2020. 31

Abdul Fatir Ansari, Konstantinos Benidis, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alexander J
Smola, Bernie Wang, and Tim Januschowski. Deep Explicit Duration Switching Models for Time Series.
Advances in Neural Information Processing Systems, 34, 2021. 9

20
V. Assimakopoulos and K. Nikolopoulos. The theta model: a decomposition approach to forecasting. Inter-
national Journal of Forecasting, 16(4):521–530, 2000. 3, 9, 31
George Athanasopoulos, Rob J. Hyndman, Haiyan Song, and Doris C. Wu. The tourism forecasting compe-
tition. International Journal of Forecasting, 27(3):822–844, 2011. 30
Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix,
Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, François-Xavier
Aubet, Laurent Callot, and Tim Januschowski. Deep learning for time series forecasting: Tutorial and
literature survey. ACM Comput. Surv., 55(6), 2022. 1
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind
Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss,
Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens
Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack
Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language
models are few-shot learners. In Advances in Neural Information Processing Systems, 2020. 1, 3, 4
Chris U Carmona, François-Xavier Aubet, Valentin Flunkert, and Jan Gasthaus. Neural Contextual Anomaly
Detection for Time Series. arXiv:2107.07702, 2021. 7
Cristian Challu, Kin G Olivares, Boris N Oreshkin, Federico Garza Ramirez, Max Mergenthaler Canseco, and
Artur Dubrawski. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting. In Proceedings
of the AAAI Conference on Artificial Intelligence, volume 37, 2023. 9, 31
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts,
Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling Language
Modeling with Pathways. Journal of Machine Learning Research, 24(240):1–113, 2023. 3
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi
Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling Instruction-Finetuned Language Models.
arXiv:2210.11416, 2022. 3
Tri Dao. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning.
arXiv:2307.08691, 2023. 19
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-
series forecasting. arXiv:2310.10688, 2023. 1, 4
Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi,
Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen,
Gustavo Batista, and Hexagon-ML. The UCR Time Series Classification Archive, October 2018. https:
//www.cs.ucr.edu/~eamonn/time_series_data_2018/. 19
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. LLM.int8(): 8-bit Matrix Multiplication
for Transformers at Scale. arXiv:2208.07339, 2022. 19
Jiaxiang Dong, Haixu Wu, Haoran Zhang, Li Zhang, Jianmin Wang, and Mingsheng Long. SimMTM: A
Simple Pre-Training Framework for Masked Time-Series Modeling. arXiv:2302.00861, 2023. 4
Samuel Dooley, Gurnoor Singh Khurana, Chirag Mohapatra, Siddartha Naidu, and Colin White. Fore-
castPFN: Synthetically-Trained Zero-Shot Forecasting. In Advances in Neural Information Processing
Systems, 2023. 1, 4, 9, 14, 31
David Duvenaud, James Lloyd, Roger Grosse, Joshua Tenenbaum, and Ghahramani Zoubin. Structure
Discovery in Nonparametric Regression through Compositional Kernel Search. In International Conference
on Machine Learning, pp. 1166–1174. PMLR, 2013. 7
Patrick Emami, Abhijeet Sahu, and Peter Graf. BuildingsBench: A Large-Scale Dataset of 900K Buildings
and Benchmark for Short-Term Load Forecasting. arXiv:2307.00142, 2023. 20

21
Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical Neural Story Generation. arXiv:1805.04833, 2018.
19
Philip J Fleming and John J Wallace. How not to lie with statistics: the correct way to summarize benchmark
results. Communications of the ACM, 29(3):218–221, 1986. 10
Markus Freitag and Yaser Al-Onaizan. Beam Search Strategies for Neural Machine Translation.
arXiv:1702.01806, 2017. 19
Yichao Fu, Peter Bailis, Ion Stoica, and Hao Zhang. Breaking the Sequential Dependency
of LLM Inference Using Lookahead Decoding, November 2023. URL https://lmsys.org/blog/
2023-11-21-lookahead-decoding/. 19
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace
He, Anish Thite, Noa Nabeshima, et al. The Pile: An 800GB Dataset of Diverse Text for Language
Modeling. arXiv:2101.00027, 2020. 6
Federico Garza, Max Mergenthaler Canseco, Cristian Challú, and Kin G. Olivares. StatsForecast: Lightning
fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022, 2022.
URL https://github.com/Nixtla/statsforecast. 31
Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin
Flunkert, and Tim Januschowski. Probabilistic Forecasting with Spline Quantile Function RNNs. In Pro-
ceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89
of Proceedings of Machine Learning Research, pp. 1901–1910. PMLR, 2019. 3, 10, 32
Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal
of the American statistical Association, 102(477):359–378, 2007. 9, 32
Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso.
Monash Time Series Forecasting Archive. In Neural Information Processing Systems Track on Datasets
and Benchmarks, 2021. 8, 27, 29, 30
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A
family of open time-series foundation models. arXiv preprint arXiv:2402.03885, 2024. 4, 19
Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large Language Models Are Zero-Shot
Time Series Forecasters. In Advances in Neural Information Processing Systems, 2023. 1, 3
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degener-
ation. arXiv:1904.09751, 2019. 19
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and
Weizhu Chen. LoRA: Low-rank adaptation of large language models. arXiv:2106.09685, 2021. 18
Rob Hyndman, Anne B Koehler, J Keith Ord, and Ralph D Snyder. Forecasting with exponential smoothing:
the state space approach. Springer Science & Business Media, 2008. 3, 9
Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2018. 1, 9
Rob J Hyndman and Anne B Koehler. Another look at measures of forecast accuracy. International journal
of forecasting, 22(4):679–688, 2006. 10, 32
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller.
Deep learning for time series classification: a review. Data mining and knowledge discovery, 33(4):917–963,
2019. 19
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan
Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Time-LLM: Time series forecasting by reprogram-
ming large language models. In The Twelfth International Conference on Learning Representations, 2024.
1, 4

22
Xiaoyong Jin, Youngsuk Park, Danielle Maddix, Hao Wang, and Yuyang Wang. Domain adaptation for time
series forecasting via attention sharing. In International Conference on Machine Learning, pp. 10280–
10297. PMLR, 2022. 1, 4

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in neural information processing
systems, 30, 2017. 19

Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15(4):143–156,
2001. 32

Stephan Kolassa and Tim Januschowski. A classification of business forecasting problems. Foresight, 52,
2019. 1

Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang
Wang. Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Fore-
casting. In Advances in Neural Information Processing Systems, volume 36, pp. 28341–28364. Curran
Associates, Inc., 2023. 9

Yaniv Leviathan, Matan Kalman, and Yossi Matias. Fast inference from transformers via speculative decod-
ing. In International Conference on Machine Learning, pp. 19274–19286. PMLR, 2023. 19

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves
Stoyanov, and Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Lan-
guage Generation, Translation, and Comprehension. arXiv:1910.13461, 2019. 3

Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable
multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748–1764, 2021. 3, 6, 9,
31

Xu Liu, Yutong Xia, Yuxuan Liang, Junfeng Hu, Yiwei Wang, Lei Bai, Chao Huang, Zhenguang Liu,
Bryan Hooi, and Roger Zimmermann. Largest: A benchmark dataset for large-scale traffic forecasting.
arXiv:2306.08259, 2023. 20

Spyros Makridakis and Michele Hibon. The M3-Competition: results, conclusions and implications. Inter-
national journal of forecasting, 16(4):451–476, 2000. 8, 30

Spyros Makridakis, Michele Hibon, and Claus Moser. Accuracy of forecasting: An empirical investigation.
Journal of the Royal Statistical Society. Series A (General), 142(2):97–145, 1979. 8, 30

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M4 Competition: 100,000 time
series and 61 forecasting methods. International Journal of Forecasting, 36(1):54–74, 2020. 8, 30

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. M5 accuracy competition: Results,
findings, and conclusions. International Journal of Forecasting, 38(4):1346–1364, 2022. 8, 30

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models.
arXiv:1609.07843, 2016. 6

Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kan-
ishka Rao, Dorsa Sadigh, and Andy Zeng. Large language models as general pattern machines. In
Proceedings of The 7th Conference on Robot Learning, volume 229 of Proceedings of Machine Learning
Research, pp. 2498–2518. PMLR, 2023. 3

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words:
Long-term forecasting with transformers. In International Conference on Learning Representations, 2023.
3, 4, 9, 31

23
Kin G. Olivares, Cristian Challú, Federico Garza, Max Mergenthaler Canseco, and Artur Dubrawski. Neu-
ralForecast: User friendly state-of-the-art neural forecasting models. PyCon Salt Lake City, Utah, US
2022, 2022. URL https://github.com/Nixtla/neuralforecast. 31

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal
Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.
arXiv:1609.03499, 2016. 9, 31

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-BEATS: Neural basis expansion
analysis for interpretable time series forecasting. In International Conference on Learning Representations,
2020. 9, 31

Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. Meta-learning framework with
applications to zero-shot time-series forecasting. In Proceedings of the AAAI Conference on Artificial
Intelligence (AAAI), 2021. 4

Bernardo Pérez Orozco and Stephen J. Roberts. Zero-shot and few-shot time series forecasting with or-
dinal regression recurrent neural networks. In 28th European Symposium on Artificial Neural Networks,
Computational Intelligence and Machine Learning, pp. 503–508, 2020. 4

Stephan Rabanser, Tim Januschowski, Valentin Flunkert, David Salinas, and Jan Gasthaus. The effectiveness
of discretization in forecasting: An empirical study on neural time series models. arXiv:2005.10111, 2020.
5

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models
are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. 4, 6

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.
The Journal of Machine Learning Research, 21(1):5485–5551, 2020. 3, 6, 9, 13

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency,
and Ehsan Hoque. Integrating multimodal information in large pretrained transformers. In Proceedings of
the conference. Association for Computational Linguistics. Meeting, volume 2020, pp. 2359. NIH Public
Access, 2020. 19

Syama Sundar Rangapuram, Matthias W Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, and Tim
Januschowski. Deep state space models for time series forecasting. Advances in neural information pro-
cessing systems, 31, 2018. 3

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion
models for multivariate probabilistic time series forecasting. In International Conference on Machine
Learning, pp. 8857–8868. PMLR, 2021. 3

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhag-
watkar, Marin Biloš, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, Alexandre
Drouin, Nicolas Chapados, Yuriy Nevmyvaka, and Irina Rish. Lag-llama: Towards foundation models for
time series forecasting, 2023. 1, 4, 9

Yaniv Romano, Evan Patterson, and Emmanuel Candes. Conformalized quantile regression. Advances in
neural information processing systems, 32, 2019. 18

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting
with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020. 3, 5,
6, 9, 31

Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword
units. arXiv:1508.07909, 2015. 3

24
Oleksandr Shchur, Ali Caner Turkmen, Nick Erickson, Huibin Shen, Alexander Shirkov, Tony Hu, and Bernie
Wang. Autogluon–timeseries: Automl for probabilistic time series forecasting. In International Conference
on Automated Machine Learning, pp. 9–1. PMLR, 2023. 10
Kamile Stankeviciute, Ahmed M Alaa, and Mihaela van der Schaar. Conformal time-series forecasting.
Advances in neural information processing systems, 34:6216–6228, 2021. 18
Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia
Song, and Furu Wei. A length-extrapolatable transformer. arXiv:2212.10554, 2022. 19
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the
inception architecture for computer vision, 2015. 4
Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang,
Dani Yogatama, Ashish Vaswani, and Donald Metzler. Scale efficiently: Insights from pre-training and
fine-tuning transformers. arXiv:2109.10686, 2021. 9, 12, 13
Kai Ming Ting and Ian H Witten. Stacking bagged and dagged models. In Proceedings of the Fourteenth
International Conference on Machine Learning, 1997. 19
Luis Torgo and Joao Gama. Regression using Classification Algorithms. Intelligent Data Analysis, 1(4):
275–292, 1997. 6
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bash-
lykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Fer-
rer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller,
Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan
Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh
Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao,
Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy
Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subra-
manian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng
Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez,
Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open Foundation and Fine-Tuned Chat
Models, 2023. 1, 3, 4
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser,
and Illia Polosukhin. Attention Is All You Need. In Advances in Neural Information Processing Systems,
2017. 3
Ruofeng Wen, Kari Torkkola, Balakrishnan Narayanaswamy, and Dhruv Madeka. A Multi-Horizon Quantile
Recurrent Forecaster. arXiv:1711.11053, 2017. 3, 6
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pier-
ric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen,
Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame,
Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art natural language processing.
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System
Demonstrations, pp. 38–45. Association for Computational Linguistics, 2020. 6, 9
Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified
training of universal time series forecasting transformers. arXiv:2402.02592, 2024. 1, 4
Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. TimesNet: Tempo-
ral 2D-Variation Modeling for General Time Series Analysis. In International Conference on Learning
Representations, 2023. 4
Renjie Wu and Eamonn Keogh. Current Time Series Anomaly Detection Benchmarks are Flawed and are
Creating the Illusion of Progress. IEEE Transactions on Knowledge and Data Engineering, 2021. 19

25
Chen Xu and Yao Xie. Conformal Prediction Interval for Dynamic Time-Series. In International Conference
on Machine Learning, pp. 11559–11569. PMLR, 2021. 18
Hao Xue and Flora D. Salim. PromptCast: A New Prompt-based Learning Paradigm for Time Series
Forecasting. arXiv:2210.08964, 2023. 1, 3
Rui Ye and Qun Dai. A novel transfer learning framework for time series forecasting. Knowledge-Based
Systems, 156:74–99, 2018. 1

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are Transformers Effective for Time Series Forecasting?
In Proceedings of the AAAI conference on artificial intelligence, volume 37, 2023. 3, 9, 31
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond Empirical Risk
Minimization. arXiv:1710.09412, 2017. 7

Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao.
Adaptive budget allocation for parameter-efficient fine-tuning. arXiv:2303.10512, 2023. 18
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen
Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv:2303.18223, 2023. 3

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer:
Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In The Thirty-Fifth AAAI
Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pp. 11106–11115. AAAI
Press, 2021. 3, 27
Tian Zhou, Peisong Niu, Xue Wang, Liang Sun, and Rong Jin. One Fits All: Power general time series
analysis by pretrained LM. In Advances in Neural Information Processing Systems, 2023a. 1, 4, 9, 31
Yun Zhou, Liwen You, Wenzhen Zhu, and Panpan Xu. Improving time series forecasting with mixup data
augmentation. In ECML PKDD 2023 International Workshop on Machine Learning for Irregular Time
Series, 2023b. 7

26
A Algorithms
Algorithm 1 and algorithm 2 present the pseudocode for TSMix and KernelSynth, respectively.

Algorithm 1 TSMix: Time Series Mixup


Input: Time series datasets {X1 , . . . , XNd }, maximum time series to be mixed K = 3, Dirichlet concentration
parameter α = 1.5, and (minimum, maximum) length of the augmented time series (lmin = 128, lmax =
2048).
Output: An augmented time series.
1: k ∼ U{1, K} ▷ number of time series to mix
2: l ∼ U{lmin , lmax } ▷ length of the augmented time series
3: for i ← 1, k do
4: n ∼ U{1, Nd } ▷ sample a dataset index
(i)
5: x1:l ∼ Xn ▷ sample a time series of length l from dataset n
(i)
6:
(i)
x̃1:l ← 1
Plx1:l (i) ▷ apply mean scaling to the time series
l |xj |
j=1
7: end for
8: [λ1 , . . . , λk ] ∼ Dir(α) ▷ sample mixing weights
Pk (i)
9: return i=1 λi x̃1:l ▷ take weighted combination of time series

Algorithm 2 KernelSynth: Synthetic Data Generation using Gaussian Processes


Input: Kernel bank K, maximum kernels per time series J = 5, and length of the time series lsyn = 1024.
Output: A synthetic time series x1:lsyn .
1: j ∼ U{1, J} ▷ sample the number of kernels
i.i.d
2: {κ1 (t, t′ ), . . . , κj (t, t′ )} ∼ K ▷ sample j kernels from K
3: κ∗ (t, t′ ) ← κ1 (t, t′ )
4: for i ← 2, j do
5: ⋆ ∼ {+, ×} ▷ sample a random binary operator
6: κ∗ (t, t′ ) ← κ∗ (t, t′ ) ⋆ κi (t, t′ ) ▷ compose kernels
7: end for
8: x1:lsyn ∼ GP(0, κ∗ (t, t′ )) ▷ sample from the GP prior
9: return x1:lsyn

B Datasets
The complete list of datasets used for our empirical evaluation is provided in Table 2. The table is divided
into three sections, representing how the datasets were used for Chronos models: in total, 55 datasets
where used for experiments, 13 of which for pretraining only, 15 for in-domain evaluation, and 27 for zero-
shot evaluation (see also Section 5). In the following, we provide a brief description of each dataset, organized
by its domain.

B.1 Energy

Australian Electricity (Godahewa et al., 2021) contains electricity demand data from 5 states in Australia.
Electricity (15 Min., Hourly, Weekly) contains electricity consumption (in kW) for 370 households.
Original data has 15 minutes frequency and was obtained from https://archive.ics.uci.edu/dataset/
321/electricityloaddiagrams20112014; hourly and weekly aggreations are from Godahewa et al. (2021).
ERCOT Load contains hourly energy load in 8 US regions between 2004 and 2021.
ETT (15 Min., Hourly) (Zhou et al., 2021) contains oil temperatures and other covariates of electrical
transformers from two stations in China, measured at 15 minutes granularity.

27
Table 2: All datasets that are used for experiments. The datasets are partitioned according to how they are used
for training and evaluation of Chronos models: pretraining-only data is only used for Chronos training; in-domain
evalution data is used for training Chronos models and other task-specific baselines, except for the H observations
that are held out for in-domain testing only; zero-shot evaluation data is not used in training Chronos models,
but only for evaluation (final H observations), as well as for training task-specific baselines (excluding the final H
observations).

Series Length Prediction


Dataset Domain Freq. Num. Series
min avg max Length (H)
Pretraining-only
Brazilian Cities Temperature nature M 12 492 757 1320 -
Mexico City Bikes transport 1H 494 780 78313 104449 -
Solar (5 Min.) energy 5min 5166 105120 105120 105120 -
Solar (Hourly) energy 1H 5166 8760 8760 8760 -
Spanish Energy and Weather energy 1H 66 35064 35064 35064 -
Taxi (Hourly) transport 1H 2428 734 739 744 -
USHCN nature 1D 6090 5906 38653 59283 -
Weatherbench (Daily) nature 1D 225280 14609 14609 14610 -
Weatherbench (Hourly) nature 1H 225280 350633 350639 350640 -
Weatherbench (Weekly) nature 1W 225280 2087 2087 2087 -
Wiki Daily (100k) web 1D 100000 2741 2741 2741 -
Wind Farms (Daily) energy 1D 337 71 354 366 -
Wind Farms (Hourly) energy 1H 337 1715 8514 8784 -
In-domain evaluation
Electricity (15 Min.) energy 15min 370 16032 113341 140256 24
Electricity (Hourly) energy 1H 321 26304 26304 26304 24
Electricity (Weekly) energy 1W 321 156 156 156 8
KDD Cup 2018 nature 1H 270 9504 10897 10920 48
London Smart Meters energy 30min 5560 288 29951 39648 48
M4 (Daily) various 1D 4227 107 2371 9933 14
M4 (Hourly) various 1H 414 748 901 1008 48
M4 (Monthly) various 1M 48000 60 234 2812 18
M4 (Weekly) various 1W 359 93 1035 2610 13
Pedestrian Counts transport 1H 66 576 47459 96424 48
Rideshare transport 1H 2340 541 541 541 24
Taxi (30 Min.) transport 30min 2428 1469 1478 1488 48
Temperature-Rain nature 1D 32072 725 725 725 30
Uber TLC (Daily) transport 1D 262 181 181 181 7
Uber TLC (Hourly) transport 1H 262 4344 4344 4344 24
Zero-shot evaluation
Australian Electricity energy 30min 5 230736 231052 232272 48
CIF 2016 banking 1M 72 28 98 120 12
Car Parts retail 1M 2674 51 51 51 12
Covid Deaths healthcare 1D 266 212 212 212 30
Dominick retail 1D 100014 201 296 399 8
ERCOT Load energy 1H 8 154854 154854 154854 24
ETT (15 Min.) energy 15min 14 69680 69680 69680 24
ETT (Hourly) energy 1H 14 17420 17420 17420 24
Exchange Rate finance 1B 8 7588 7588 7588 30
FRED-MD economics 1M 107 728 728 728 12
Hospital healthcare 1M 767 84 84 84 12
M1 (Monthly) various 1M 617 48 90 150 18
M1 (Quarterly) various 3M 203 18 48 114 8
M1 (Yearly) various 1Y 181 15 24 58 6
M3 (Monthly) various 1M 1428 66 117 144 18
M3 (Quarterly) various 3M 756 24 48 72 8
M3 (Yearly) various 1Y 645 20 28 47 6
M4 (Quarterly) various 3M 24000 24 100 874 8
M4 (Yearly) various 1Y 23000 19 37 841 6
M5 retail 1D 30490 124 1562 1969 28
NN5 (Daily) finance 1D 111 791 791 791 56
NN5 (Weekly) finance 1W 111 113 113 113 8
Tourism (Monthly) various 1M 366 91 298 333 24
Tourism (Quarterly) various 1Q 427 30 99 130 8
Tourism (Yearly) various 1Y 518 11 24 47 4
Traffic transport 1H 862 17544 17544 17544 24
Weather nature 1D 3010 1332 14296 65981 30

28
London Smart Meters contains half-hourly energy consumption of 5561 households in the
UK between 2011 and 2014. Data was obtained from https://data.london.gov.uk/dataset/
smartmeter-energy-use-data-in-london-households.
Solar (5 Min., Hourly) contains data about solar power generation in the US in 2006. The original data
has 5 minute frequency and was obtained from https://www.nrel.gov/grid/solar-power-data.html; the
hourly version was obtained via mean aggregation.
Spanish Energy and Weather contains 4 years of electricity consumption, generation, pricing, and
weather data for Spain. Electricity data is for all of Spain, weather data is provided for each of 5
major Spanish cities. The data was obtained from https://www.kaggle.com/datasets/nicholasjhana/
energy-consumption-generation-prices-and-weather.
Wind Farms (Hourly, Daily) (Godahewa et al., 2021) contains energy production data from wind farms
in Australia. Original data was collected at 1 minute frequencey, which we aggregated to hourly and daily
using the mean.

B.2 Finance and economics

CIF 2016 (Godahewa et al., 2021) contains banking data that was used in the CIF 2016 forecasting com-
petition. Of all time series included, 24 are real data while the other 48 are artificially generated.
Exchange Rate contains daily exchange rates for currencies of eight countries (Australia, British, Canada,
Switzerland, China, Japan, New Zealand and Singapore) between 1990 and 2016.
FRED-MD (Godahewa et al., 2021) contains monthly macro-economic indicators from the Federal Reserve
Bank. Data was extracted from the FRED-MD database, and the were differenced and log-transformed.
NN5 (Daily, Weekly) (Godahewa et al., 2021) contains cash withdrawal data from ATMs.

B.3 Healthcare

Covid Deaths (Godahewa et al., 2021) contains daily count data of COVID-19 deaths in a set of countries
and states, between January and August, 2020.
Hospital (Godahewa et al., 2021) contains monthly time series that represent the patient counts related to
medical products from January 2000 to December 2006.

B.4 Nature

Brazilian Cities Temperature contains monthly time series representing the weather at 12 different cities
in Brazil. Data is originally from NOAA, and we used the post-processed version from https://www.kaggle.
com/datasets/volpatto/temperature-timeseries-for-some-brazilian-cities.
KDD Cup 2018 (Godahewa et al., 2021) contains various air quality indicators (including PM2.5, PM10,
NO2, CO, O3 and SO2), measured in 59 stations in Beijing and London, between January 1, 2017 and March
31, 2018.
Temperature-Rain (Godahewa et al., 2021) contains daily temperature observations and rain forecasts
from 422 stations in Australia, between 2015 and 2017.
USHCN contains daily measurements of five climate indicators (precipitation, snow, snow depth, minimum
temperature, maximum temperature) from climate stations located in 48 states in the USA. Data was
obtained from https://cdiac.ess-dive.lbl.gov/ftp/ushcn_daily/.
Weather (Godahewa et al., 2021) contains daily time series of four weather variables (rain, mintemp,
maxtemp and solar radiation) measured at weather stations in Australia.
Weatherbench (Hourly, Daily, Weekly) contains WeatherBench data at the spatial resolution of 5.625°
(32×64 grid points). WeatherBench is a comprehensive benchmark dataset for weather prediction research

29
and contains hourly values of the many weather-related variables over 40 years from 1979 to 2018 (including
temperature, humidity, wind, precipitations). The original data has hourly frequency and was obtained from
https://github.com/pangeo-data/WeatherBench; we aggregated it to daily and weekly using mean, except
for “total precipitation” which was aggregated by sum.

B.5 Retail

Car Parts (Godahewa et al., 2021) contains monthly sales data for various car parts, measured between
January 1998 and March 2002.
Dominick (Godahewa et al., 2021) contains weekly time series representing the profit of individual stock
keeping units from a retailer. Original data is from https://www.chicagobooth.edu/research/kilts/
datasets/dominicks.

B.6 Mobility and transport

Mexico City Bikes contains hourly usage statistics for 494 bike stations in Mexico City from 2010 to 2022.
Each value in the time series corresponds to the number of bikes returned at the given station at the given
hour of the day. Data was obtained from https://ecobici.cdmx.gob.mx/en/open-data. Time series that
contain less than 50 non-zero observations were removed.
Pedestrian Counts (Godahewa et al., 2021) contains data from 66 sensors in Melbourne, counting pedes-
trians between 2009 and 2020.
Rideshare contains various hourly statistics of Uber and Lyft services in New York, between November 26,
2018 and December 18, 2018.
Taxi (30 Min., Hourly) contains spatio-temporal traffic time series of New York taxi rides taken at 1214
locations every 30 minutes in the months of January 2015 and January 2016. Original data has 30 minutes
frequency, the hourly version was obtain by aggregation with sum.
Tourism (Monthly to Yearly) (Athanasopoulos et al., 2011; Godahewa et al., 2021) Tourism dataset
from, used for the Kaggle Tourism Forecasting competition.
Traffic (Godahewa et al., 2021) contains hourly road occupancy readings from sensors in the San Francisco
Bay area.
Uber TLC (Hourly, Daily) contains the number of Uber pick-ups from various locations in New
York, between January and June 2015. Data was obtained from https://github.com/fivethirtyeight/
uber-tlc-foil-response and aggregated hourly and daily.

B.7 Various

M1 (Monthly to Yearly) (Makridakis et al., 1979; Godahewa et al., 2021) contains the time time series
used in the M1 forecasting competition. Data spans micro-/macroeconomics, industry, and demographics.
M3 (Monthly to Yearly) (Makridakis & Hibon, 2000; Godahewa et al., 2021) contains the time time
series used in the M1 forecasting competition. Data spans micro-/macroeconomics, industry, finance and
demographics.
M4 (Hourly to Yearly) (Makridakis et al., 2020; Godahewa et al., 2021) contains data from various
domains, at different sampling periods, used for the M4 forecasting competition. Domains include micro-
/macroeconomics, demographic, industry, and finance.
M5 (Makridakis et al., 2022) contains products sales data, used for the M5 forecasting competition. The
data includes sales up to the end of the validation set (end of public leaderboard), but not values for the
test set (private leaderboard).

30
B.8 Web

Wiki Daily (100k) contains daily page views on the top-100k English Wikipedia articles between 2007 and
2022, ranked by number of observations (non-missing). Data was obtained from https://dumps.wikimedia.
org/other/pageviews/.

C Baselines

We consider a total of 14 baseline methods for benchmarking Chronos. Local statistical baselines were
AutoETS, AutoARIMA, Naive, Seasonal Naive, and AutoTheta (Assimakopoulos & Nikolopoulos, 2000); for
these, we rely on implementations in the StatsForecast library (Garza et al., 2022). For task-specific deep
learning architectures, DeepAR (Salinas et al., 2020), PatchTST (Nie et al., 2023), TFT (Lim et al., 2021),
DLinear (Zeng et al., 2023), and WaveNet (Oord et al., 2016), we based evaluations on the implementations
in GluonTS (Alexandrov et al., 2020). However, N-BEATS (Oreshkin et al., 2020) and N-HiTS (Challu
et al., 2023), experiments are based on implementations in the NeuralForecast (Olivares et al., 2022) library.
Finally, we use reference implementations of ForecastPFN7 (Dooley et al., 2023) and GPT4TS8 (One-Fits-
All) (Zhou et al., 2023a).
Both training and inference for WaveNet, GPT4TS, and ForecastPFN were conducted on NVIDIA V100
GPUs with 16GB memory on AWS EC2 p3 instances. All other baselines are both trained on the CPU
on Intel-based EC2 instances. Task-specific deep learning baselines not based on large language models
(DeepAR, PatchTST, TFT, DLinear, WaveNet, N-BEATS, and N-HiTS) are trained and evaluated three
times and their performance averaged in order to account for high variance inherent in their optimization.
Statistical baselines (AutoETS, AutoARIMA, AutoTheta and SeasonalNaive) are used with their default
hyperparameters in StatsForecast, but with season lengths implied by their frequencies. For example, daily
frequency data had season length set to 7, hourly data 24, etc. For this heuristic, we use the helper function
get_seasonality from GluonTS.
In PatchTST and DLinear, we experiment with two loss functions: original losses aimed at point forecasting
(L1 or L2 loss) as well as default probabilistic forecasting heads used in their GluonTS implementations,
where loss is set to the negative Student’s t log likelihood of the forecast horizon. Due to the consistently
superior performance, our final results include probabilistic versions of PatchTST and DLinear only.
Default hyperparameter configurations provided in baseline implementations are kept as is, and no dataset
specific or global hyperparameter tuning is performed. GluonTS-based implementations sre optimized with
a batch size of 128, for a time limit of 4 hours and early stopping patience of 200 epochs. For GPT4TS,
we set the context length equal to a multiple of the prediction length, with the multiplier depending on the
frequency of the dataset (Table 3). We use the MASE loss function for fine-tuning in GPT4TS due to its
superior performance. A summary of the baseline models used along with details of hyperparameter values
is provided in Table 4.

Table 3: The multiplier used to set the context length in GPT4TS for each frequency. The context length is set equal
to the multiplier times the prediction length, rounded to the nearest whole number.

Frequency Multiplier
15min 20
30min 10
1H 10
1D or 1B 10
1W 10
1M 1.5
3M or 1Q 1.5
1Y 1.5

7 https://github.com/abacusai/ForecastPFN
8 https://github.com/DAMO-DI-ML/NeurIPS2023-One-Fits-All

31
Table 4: Baseline models and hyperparameter choices. Hyperparameters not specified are set to defaults in their
respective implementations. C stands for context length, dh for hidden layer dimension, nL for number of layers, nH
for number of heads, and η for learning rate.

Model Model Type Implementation Probabilistic Hyperparameters


SeasonalNaive Local StatsForecast Yes N/A
AutoETS Local StatsForecast Yes C = 2500
AutoARIMA Local StatsForecast Yes C = 1000
AutoTheta Local StatsForecast Yes C = 2500
DeepAR Task-specific GluonTS Yes dh = 40, nL = 2
TFT Task-specific GluonTS Yes dh = 32, nH = 4
PatchTST Task-specific GluonTS Yes Patch length: 16, Stride: 8, dh = 32, nL = 2, nH = 4
DLinear Task-specific GluonTS Yes Kernel size: 25, dh = 20
WaveNet Task-specific GluonTS Yes Residual channels: 24, Skip channels: 3
N-BEATS Task-specific NeuralForecast No Input size multiplier: 5
N-HiTS Task-specific NeuralForecast No Input size multiplier: 5
GPT4TS Task-specific Reference No Fine-tuning epochs: 100, cos: 1, tmax: 10, nL = 6, η = 10−3 ,
with pretrained GPT-2 weights
ForecastPFN Zero-shot Reference No C = 100 (as in the released pretrained model)

D Evaluation Metrics
In what follows, we consider a dataset of N time series {xi = [xi,1 , . . . , xi,C+H ]}N
i=1 , each spanning both the
context length C and prediction horizon H. We are interested in evaluating the accuracy of predictions for
xi,C+1:C+H , for all i ∈ {1, . . . , N }, which can be either point forecasts or probabilistic ones.
A point forecast for xi is denoted as as x̂i = [x̂i,C+1 , . . . , x̂i,C+H ]. To evaluate point forecasts, we use the
mean absolute scaled error (MASE, Hyndman & Koehler (2006)). For each series, this is simply the mean
absolute error (MAE) divided by the empirical error of a seasonal naïve model:
PC+H
C −S t=C+1 |x̂i,t − xi,t |
MASE(x̂i , xi ) = PC−S ,
H t=1 |xi,t − xi,t+S |

where S is a seasonality parameter. Since the denominator scales proportionally to xi , this error metric is
independent of the scale of the data. To aggregate MASE over the entire dataset, we average over all i.
(α) (α) (α)
Probabilistic forecasts are given in terms of predicted quantiles qi = [qi,C+1 , . . . , qi,C+H ] at levels α ∈ (0, 1).
To evaluate the quality of such predicted quantiles, we use the weighted quantile loss (WQL): this is an
aggregation of the quantile loss (Koenker & Hallock, 2001), which is defined for the predicted α-quantile q
of a real observation x, as (
α(x − q), if x > q,
QLα (q, x) = (4)
(1 − α)(q − x), otherwise.
To aggregate Eq. (4) over multiple series and prediction instants, we consider the weighted average
(α)
2 i,t QLα (qi,t , xi,t )
P
WQLα = P .
i,t |xi,t |

We average the above over a finite set of levels {α1 , . . . , αK } to obtain


K
1 X
WQL = WQLαj .
K j=1

In all experiments, we use quantiles at level α ∈ {0.1, 0.2, . . . , 0.9} to compute WQL, so that K = 9. Note
that, being a weighted average of the quantile loss at different levels, WQL approximates (a weighted average
of) the continuous ranked probability score (CRPS), a commonly used metric for evaluating probabilistic
predictions (Gneiting & Raftery, 2007; Gasthaus et al., 2019). Unlike for MASE, where errors are scaled by
a term proportional to the scale of each series, WQL aggregates absolute errors: as such, its value is affected
by the relative scale of all series in the dataset.

32
E Additional Results

This section complements Section 5.5 by providing additional details to the experimental results. Table 5
reports the training time and cost of Chronos-T5 models on a p4d.24xlarge EC2 instance. Tables 6 and 7
report the raw WQL and MASE scores together with the aggregate relative score and average rank obtained
by all models on the datasets in Benchmark I. Similarly, Tables 8 and 9 report these scores on Benchmark II.
Figures 18 and 19 show the average ranks obtained by different models on Benchmark I and II, respectively.
Figure 20 illustrates the zero-shot performance of Chronos-T5-Synth (Small), a model trained solely on
synthetic data generated using KernelSynth, against various baselines.

Table 5: Training time and the cost of training Chronos models on a single p4d.24xlarge instance. On-demand
EC2 pricing of $32.773/hr was used to compute the cost (rounded to the nearest dollar).

Model Training Time (hrs) Cost (USD)


Chronos-T5 (Mini) 7.68 252
Chronos-T5 (Small) 7.73 253
Chronos-T5 (Base) 17.96 588
Chronos-T5 (Large) 63.05 2066

Table 6: WQL scores of different models for datasets in Benchmark I, comprising 15 datasets also included in the
training data of Chronos models. Models achieving the first, second, and third best scores have been highlighted.
Scores for Chronos and task-specific models have been averaged over 3 random seeds. The aggregated relative score
was computed as described in Section 5.4.

Pretrained Models (In Domain) Task Specific Models Local Models


2
PT
ge T5

e) 5

l) 5

i) 5

A
as s-T

al T

in s-T

IM
ta
-G
ar s-

m s-

ST

ve l
S

ai a
et

he
T
(L ono

(B ono

(S ono

(M no

R
os
)

ET
ar

N son
S
hT

EA
eN
pA

oA
oT
iT
o

on

ne

ve
hr

hr

hr

hr

o
FT

a
tc

av

-H

-B
hr

Li

ut

ut

ut
ee

Se

ai
C

Pa

W
D

N
C

Electricity (15 Min.) 0.076 0.076 0.081 0.081 0.077 0.082 0.090 0.091 0.189 0.079 0.081 0.084 - 0.229 - 0.117 0.279
Electricity (Hourly) 0.102 0.115 0.107 0.092 0.121 0.089 0.106 0.109 0.125 0.095 0.128 0.127 0.129 0.198 0.126 0.147 0.363
Electricity (Weekly) 0.064 0.067 0.077 0.078 0.069 0.069 0.116 0.105 0.106 0.146 0.098 0.097 0.151 0.146 0.138 0.198 0.198
KDD Cup 2018 0.273 0.272 0.294 0.271 0.359 0.252 0.330 0.280 0.571 0.312 0.302 0.315 2.266 0.521 0.528 0.556 -
London Smart Meters 0.423 0.426 0.430 0.433 0.431 0.346 0.405 0.374 0.365 0.369 0.358 0.357 - 0.660 - 0.541 0.731
M4 (Daily) 0.021 0.021 0.021 0.021 0.020 0.023 0.023 0.023 0.023 0.024 0.022 0.022 0.027 0.024 0.023 0.028 0.028
M4 (Hourly) 0.025 0.028 0.026 0.025 0.038 0.027 0.038 0.046 0.033 0.038 0.040 0.045 0.066 0.041 - 0.048 0.166
M4 (Monthly) 0.102 0.103 0.104 0.104 0.110 0.095 0.101 0.107 0.097 0.111 0.094 0.093 0.100 0.098 - 0.146 0.140
M4 (Weekly) 0.038 0.039 0.041 0.042 0.040 0.039 0.046 0.045 0.051 0.044 0.039 0.040 0.052 0.053 0.050 0.063 0.063
Pedestrian Counts 0.198 0.203 0.234 0.241 0.175 0.257 0.229 0.248 0.261 0.247 0.254 0.241 0.619 1.818 0.340 0.319 0.814
Rideshare 0.141 0.140 0.141 0.135 0.141 0.135 0.130 0.184 0.134 0.159 0.152 0.172 0.154 0.138 0.157 0.186 -
Taxi (30 Min.) 0.267 0.273 0.312 0.311 0.335 0.363 0.395 0.347 0.382 0.335 0.306 0.305 - 0.456 - 0.471 0.741
Temperature-Rain 0.663 0.670 0.686 0.704 0.669 0.804 0.718 0.708 0.670 0.848 0.780 0.798 1.182 1.060 0.869 1.424 -
Uber TLC (Daily) 0.096 0.097 0.099 0.106 0.099 0.100 0.110 0.126 0.111 0.106 0.116 0.108 0.167 0.190 0.151 0.231 0.231
Uber TLC (Hourly) 0.155 0.154 0.156 0.161 0.161 0.167 0.176 0.168 0.179 0.234 0.166 0.161 0.462 0.433 0.311 0.299 0.625
Agg. Relative Score 0.574 0.589 0.610 0.607 0.624 0.601 0.676 0.689 0.734 0.697 0.656 0.664 1.076 1.083 0.876 1.000 1.433
Avg. Rank 3.000 4.267 5.800 5.667 6.133 5.667 8.333 9.533 8.800 9.133 7.200 7.333 14.367 12.800 13.967 14.800 16.200

Table 7: MASE scores of different models for datasets in Benchmark I, comprising 15 datasets also included in the
training data of Chronos models. Models achieving the first, second, and third best scores have been highlighted.
Scores for Chronos and task-specific models have been averaged over 3 random seeds. The aggregated relative score
was computed as described in Section 5.4.

Pretrained Models (In Domain) Task Specific Models Local Models


2
PT
ge T5

e) 5

l) 5

i) 5

A
as s-T

al T

in s-T

IM
ta
-G
ar s-

m s-

ST

ve l
S

ai a
S
et

he
T
(L ono

(B ono

(S ono

(M no

R
os
)

T
ar

N son
4T
S
hT

EA
eN
pA

oA
oT
oE
iT
o

on

ne

ve
hr

hr

hr

hr

PT
FT

a
tc

av

-H

-B
hr

Li

ut

ut

ut
ee

Se

ai
C

Pa

G
D

N
C

Electricity (15 Min.) 0.403 0.409 0.430 0.453 0.406 0.450 0.515 0.637 1.108 0.452 0.579 0.567 0.508 - 0.583 - 0.498 1.270
Electricity (Hourly) 1.457 1.602 1.500 1.370 1.653 1.349 1.528 1.537 1.789 1.369 1.880 1.848 1.487 1.774 2.151 1.715 1.840 4.159
Electricity (Weekly) 1.829 1.888 2.007 2.044 1.879 1.631 2.517 1.929 2.800 2.613 1.975 2.035 1.880 3.086 3.078 3.009 3.037 3.037
KDD Cup 2018 0.663 0.673 0.688 0.656 0.771 0.616 0.779 0.671 1.022 0.695 0.674 0.731 0.737 1.014 1.138 1.023 0.994 -
London Smart Meters 0.843 0.852 0.859 0.868 0.856 0.733 0.832 0.824 0.788 0.799 0.777 0.781 0.794 - 0.966 - 0.966 1.297
M4 (Daily) 3.099 3.109 3.112 3.108 3.058 3.450 3.305 3.306 3.292 3.461 3.143 3.155 5.109 3.270 3.335 3.257 3.278 3.278
M4 (Hourly) 0.927 0.894 0.881 0.879 0.977 0.967 1.215 1.613 1.833 1.867 3.231 3.457 1.511 1.604 2.458 - 1.193 11.608
M4 (Monthly) 0.976 0.985 1.000 1.006 1.051 0.962 1.040 1.101 1.009 1.022 0.994 0.942 0.979 0.970 0.966 - 1.260 1.205
M4 (Weekly) 2.201 2.221 2.270 2.309 2.377 1.996 2.346 2.523 2.745 2.429 2.094 1.976 3.040 2.548 2.657 2.373 2.777 2.777
Pedestrian Counts 0.289 0.286 0.304 0.309 0.277 0.339 0.311 0.334 0.364 0.327 0.324 0.315 0.393 0.487 1.275 0.383 0.369 0.842
Rideshare 0.892 0.892 0.880 0.857 0.897 0.827 0.996 0.983 1.067 1.448 0.933 0.919 1.088 0.910 0.970 1.028 1.250 -
Taxi (30 Min.) 0.828 0.849 0.938 0.939 1.032 1.077 1.158 1.070 1.113 1.018 0.950 0.934 1.113 - 1.193 - 1.160 1.768
Temperature-Rain 0.982 0.991 1.013 1.033 0.984 1.250 1.015 1.076 0.994 1.370 1.232 1.343 1.226 1.968 1.945 1.524 2.243 -
Uber TLC (Daily) 0.819 0.836 0.871 0.906 0.846 0.813 0.905 0.938 0.916 0.855 0.877 0.879 0.838 1.228 1.312 1.114 1.378 1.378
Uber TLC (Hourly) 0.727 0.729 0.727 0.743 0.740 0.696 0.703 0.776 0.746 0.778 0.716 0.751 0.754 1.009 1.036 0.982 0.931 1.390
Agg. Relative Score 0.726 0.736 0.751 0.752 0.763 0.740 0.821 0.842 0.939 0.864 0.854 0.861 0.871 0.983 1.129 0.941 1.000 1.484
Avg. Rank 3.400 4.733 5.733 6.267 6.467 4.600 9.267 10.267 11.533 10.133 7.933 8.067 9.933 13.500 14.333 14.433 13.867 16.533

33
Table 8: WQL scores of different models for datasets in Benchmark II, comprising 27 datasets not seen by Chronos
models during training. Models achieving the first, second, and third best scores have been highlighted. Scores
for Chronos and task-specific models have been averaged over 3 random seeds. The aggregated relative score was
computed as described in Section 5.4.

Pretrained Models (Zero Shot) Task Specific Models Local Models

2
PT
ge T5

e) 5

l) 5

i) 5

A
as s-T

al T

in s-T

IM
ta
-G
ar s-

m s-

ST

ve l
S

ai a
et

he
T
(L ono

(B ono

(S ono

(M no

R
os
)

T
ar

N son
S
T

EA
eN
pA

oA
T
oE
iT
o

on

ne
ch

ve
hr

hr

hr

hr

o
FT

a
av

-H

-B
hr

Li

ut

ut

ut
ee

Se
t

ai
C

Pa

W
D

N
C

T
Australian Electricity 0.068 0.079 0.077 0.069 0.082 0.037 0.087 0.052 0.036 0.066 0.034 0.038 0.125 0.055 0.073 0.084 0.159
Car Parts 1.067 1.058 1.033 1.029 1.031 0.998 0.967 0.941 0.871 1.119 0.880 0.877 1.309 1.337 - 1.600 -
CIF 2016 0.013 0.013 0.014 0.015 0.014 0.140 0.136 0.086 0.011 0.033 0.032 0.039 0.039 0.027 0.017 0.015 0.009
Covid Deaths 0.045 0.046 0.060 0.078 0.089 0.065 0.108 0.918 0.034 0.077 0.038 0.056 0.064 0.094 0.029 0.133 0.133
Dominick 0.337 0.338 0.344 0.351 0.341 0.345 0.364 0.327 0.320 0.435 0.313 0.312 0.483 0.485 - 0.453 0.453
ERCOT Load 0.016 0.015 0.017 0.018 0.016 0.017 0.032 0.024 0.023 0.023 0.020 0.020 0.122 0.041 0.052 0.037 0.181
ETT (15 Min.) 0.064 0.065 0.060 0.065 0.068 0.054 0.069 0.113 0.075 0.071 0.051 0.053 0.095 0.079 0.073 0.141 0.121
ETT (Hourly) 0.074 0.075 0.077 0.083 0.076 0.071 0.081 0.142 0.082 0.076 0.081 0.074 0.132 0.133 0.105 0.122 0.202
Exchange Rate 0.017 0.019 0.020 0.017 0.017 0.010 0.009 0.016 0.011 0.008 0.010 0.011 0.010 0.010 0.011 0.013 0.015
FRED-MD 0.026 0.024 0.017 0.019 0.029 0.042 0.043 0.058 0.112 0.069 0.057 0.061 0.055 0.057 0.056 0.122 0.064
Hospital 0.057 0.057 0.058 0.059 0.059 0.070 0.056 0.064 0.053 0.089 0.052 0.050 0.053 0.055 0.058 0.073 0.087
M1 (Monthly) 0.129 0.126 0.138 0.141 0.135 0.165 0.150 0.150 0.175 0.189 0.189 0.187 0.162 0.159 0.146 0.191 0.258
M1 (Quarterly) 0.105 0.099 0.103 0.101 0.115 0.078 0.089 0.094 0.122 0.079 0.111 0.085 0.083 0.082 0.091 0.150 0.130
M1 (Yearly) 0.176 0.183 0.169 0.179 0.208 0.165 0.139 0.168 0.124 0.245 0.198 0.182 0.142 0.137 0.160 0.209 0.209
M3 (Monthly) 0.096 0.097 0.099 0.099 0.104 0.113 0.099 0.100 0.096 0.121 0.097 0.101 0.093 0.095 0.102 0.149 0.158
M3 (Quarterly) 0.073 0.075 0.078 0.080 0.078 0.074 0.073 0.072 0.071 0.086 0.076 0.080 0.069 0.070 0.079 0.101 0.103
M3 (Yearly) 0.150 0.151 0.153 0.158 0.149 0.133 0.122 0.130 0.130 0.143 0.182 0.181 0.127 0.128 0.162 0.167 0.167
M4 (Quarterly) 0.082 0.083 0.083 0.085 0.086 0.074 0.080 0.079 0.080 0.085 0.073 0.073 0.080 0.079 0.082 0.119 0.110
M4 (Yearly) 0.133 0.135 0.135 0.139 0.147 0.106 0.111 0.109 0.110 0.115 - - 0.118 0.115 0.130 0.161 0.161
M5 0.588 0.587 0.591 0.596 0.600 0.597 0.657 0.594 0.560 0.687 0.563 0.560 0.628 0.636 0.624 1.024 1.024
NN5 (Daily) 0.155 0.160 0.170 0.172 0.165 0.149 0.155 0.154 0.145 0.159 0.149 0.147 0.264 0.294 0.312 0.425 0.425
NN5 (Weekly) 0.089 0.090 0.090 0.089 0.094 0.081 0.087 0.098 0.086 0.090 0.098 0.114 0.088 0.090 0.090 0.123 0.123
Tourism (Monthly) 0.099 0.099 0.112 0.108 0.096 0.092 0.092 0.104 0.096 0.101 0.092 0.084 0.090 0.091 0.093 0.104 0.297
Tourism (Quarterly) 0.062 0.068 0.070 0.077 0.068 0.074 0.072 0.082 0.074 0.080 0.077 0.063 0.070 0.061 0.098 0.119 0.166
Tourism (Yearly) 0.185 0.199 0.197 0.217 0.187 0.136 0.127 0.179 0.102 0.165 0.139 0.154 0.159 0.176 0.156 0.209 0.209
Traffic 0.254 0.261 0.261 0.262 0.252 0.246 0.233 0.234 0.264 0.250 0.263 0.270 0.557 0.905 - 0.362 0.643
Weather 0.139 0.140 0.143 0.149 0.145 0.143 0.147 0.152 0.151 0.174 0.143 0.144 0.214 0.217 0.185 0.217 0.217
Agg. Relative Score 0.649 0.661 0.672 0.690 0.700 0.684 0.733 0.842 0.639 0.757 0.672 0.681 0.838 0.793 0.761 1.000 1.152
Avg. Rank 6.407 7.519 8.519 9.741 9.370 5.963 7.185 9.148 6.000 10.519 7.463 7.204 8.741 8.481 10.574 14.889 15.278

Table 9: MASE scores of different models for datasets in Benchmark II, comprising 27 datasets not seen by Chronos
models during training. Models achieving the first, second, and third best scores have been highlighted. Scores
for Chronos and task-specific models have been averaged over 3 random seeds. The aggregated relative score was
computed as described in Section 5.4.
Pretrained Models (Zero Shot) Task Specific Models Local Models
2
PT
ge T5

e) 5

l) 5

i) 5

FN

A
as s-T

al T

in s-T

IM
a
-G
ar s-

m s-

ST

ve l
P

t
S

ai a
S
et

he
T
(L ono

(B ono

(S ono

(M no

R
st
os
)

T
ar

N son
4T
S
hT

EA
eN
pA

oA
T
ca

oE
iT
o

on

ne

ve
hr

hr

hr

hr

PT

o
FT

a
tc

av
re

-H

-B
hr

Li

ut

ut

ut
ee

Se

ai
C

Pa
Fo

G
D

N
C

Australian Electricity 1.306 1.333 1.403 1.212 1.370 2.158 0.871 1.473 0.997 0.810 1.278 0.794 0.828 1.161 2.391 0.897 1.393 1.253 2.362
Car Parts 0.911 0.897 0.891 0.893 0.879 2.657 0.803 0.798 0.817 0.799 0.879 0.803 0.803 0.891 1.185 1.229 - 1.201 -
CIF 2016 0.985 0.995 1.016 1.052 1.065 3.588 1.537 1.363 1.309 1.553 1.145 1.389 1.440 0.960 0.957 1.002 1.006 1.289 1.263
Covid Deaths 42.762 42.641 42.689 43.525 48.028 91.515 36.465 38.203 102.457 30.635 40.418 31.771 31.730 75.909 38.114 45.407 31.705 46.912 46.912
Dominick 0.837 0.838 0.838 0.853 0.841 3.274 0.867 0.851 0.812 0.800 0.880 0.782 0.782 1.813 0.885 1.016 - 0.871 0.871
ERCOT Load 0.556 0.541 0.577 0.579 0.568 3.975 0.553 1.197 0.780 0.690 0.651 0.615 0.648 0.558 2.826 1.306 1.284 0.761 4.234
ETT (15 Min.) 0.712 0.710 0.669 0.732 0.753 1.138 0.652 0.874 1.339 0.962 0.724 0.643 0.659 0.574 1.183 0.583 0.879 1.169 1.164
ETT (Hourly) 0.738 0.749 0.769 0.794 0.750 1.833 0.729 0.814 1.509 0.875 0.695 0.811 0.782 0.768 1.139 0.900 0.977 0.932 1.651
Exchange Rate 3.231 3.460 3.357 3.223 3.206 7.583 1.540 1.615 3.105 2.361 1.459 2.041 2.149 2.709 1.643 1.648 1.882 1.740 1.874
FRED-MD 0.592 0.584 0.576 0.537 0.569 2.621 0.745 0.621 0.849 0.929 0.713 0.696 0.635 0.693 0.544 0.566 0.473 1.101 0.622
Hospital 0.815 0.820 0.819 0.821 0.833 1.775 0.859 0.804 0.857 0.799 0.940 0.781 0.760 0.793 0.760 0.761 0.820 0.921 0.968
M1 (Monthly) 1.086 1.119 1.163 1.172 1.164 2.172 1.208 1.122 1.266 1.326 1.369 1.333 1.236 1.198 1.072 1.099 1.153 1.314 1.468
M1 (Quarterly) 1.699 1.735 1.776 1.799 1.767 9.931 1.920 1.741 1.904 2.144 1.943 2.061 2.043 1.958 1.710 1.683 1.770 2.078 1.952
M1 (Yearly) 4.296 4.582 4.616 4.898 4.674 23.089 4.042 3.685 4.727 4.316 11.565 5.568 6.212 3.675 4.110 3.697 3.870 4.894 4.894
M3 (Monthly) 0.853 0.861 0.883 0.898 0.925 2.240 1.225 0.943 0.950 0.916 1.161 0.899 0.883 0.950 0.869 0.861 0.933 1.146 1.175
M3 (Quarterly) 1.170 1.185 1.250 1.270 1.230 10.176 1.264 1.209 1.257 1.160 1.572 1.202 1.147 1.448 1.125 1.130 1.419 1.425 1.464
M3 (Yearly) 3.094 3.186 3.238 3.348 3.112 18.728 2.949 2.827 3.026 2.860 3.435 3.432 3.547 3.418 2.696 2.613 3.165 3.172 3.172
M4 (Quarterly) 1.203 1.213 1.228 1.252 1.285 6.927 1.150 1.254 1.241 1.248 1.229 1.157 1.129 1.215 1.188 1.193 1.276 1.602 1.477
M4 (Yearly) 3.569 3.641 3.613 3.705 3.894 - 3.072 3.178 3.221 3.119 3.295 - - 3.374 3.374 3.124 3.730 3.974 3.974
M5 0.946 0.942 0.942 0.946 0.972 1.530 0.919 0.956 0.959 0.909 1.027 0.917 0.917 0.935 1.101 1.100 1.057 1.399 1.399
NN5 (Daily) 0.570 0.584 0.620 0.638 0.612 1.375 0.575 0.585 0.585 0.556 0.604 0.571 0.571 0.720 1.039 1.073 1.214 1.292 1.292
NN5 (Weekly) 0.917 0.925 0.930 0.927 0.962 1.349 0.877 0.920 1.034 0.896 0.966 0.919 1.014 1.268 0.978 0.984 0.995 1.063 1.063
Tourism (Monthly) 1.741 1.817 1.891 1.937 1.772 4.348 1.572 1.529 1.629 1.686 1.551 1.514 1.486 1.573 1.497 1.680 1.573 1.631 3.591
Tourism (Quarterly) 1.660 1.705 1.727 1.822 1.814 5.595 1.723 1.586 1.769 1.729 1.690 1.585 1.618 1.750 1.590 1.658 1.661 1.699 3.633
Tourism (Yearly) 3.729 3.858 3.879 4.049 3.839 12.093 3.138 3.702 4.130 3.047 3.406 3.448 3.564 - 3.138 3.078 4.043 3.552 3.552
Traffic 0.798 0.823 0.833 0.847 0.810 1.909 0.790 0.737 0.797 0.880 0.821 0.927 0.968 0.787 1.685 1.794 - 1.077 2.052
Weather 0.827 0.829 0.839 0.855 0.871 2.003 0.860 0.911 0.945 0.913 0.997 0.910 0.888 0.972 1.079 0.991 0.907 1.004 1.004
Agg. Relative Score 0.831 0.844 0.856 0.866 0.866 2.450 0.810 0.843 0.951 0.847 0.894 0.830 0.835 0.895 0.953 0.875 0.908 1.000 1.188
Avg. Rank 6.556 7.852 9.333 10.667 10.111 18.444 6.778 7.630 11.519 8.185 10.667 7.889 8.037 9.926 8.407 7.704 11.167 13.815 15.315

34
0SGEP1SHIPW 8EWO7TIGMJMG1SHIPW 4VIXVEMRIH1SHIPW -R(SQEMR

'LVSRSW8 0EVKI  'LVSRSW8 0EVKI 


'LVSRSW8 &EWI  4EXGL878 
'LVSRSW8 1MRM  'LVSRSW8 &EWI 
4EXGL878  'LVSRSW8 7QEPP 
 'LVSRSW8 1MRM 
'LVSRSW8 7QEPP
 'LVSRSW+48 
'LVSRSW+48
2,M87 
2,M87 
2&)%87 
2&)%87 
1SHIP

1SHIP
(IIT%6 
(IIT%6 
+4887 
8*8 
(0MRIEV 
(0MRIEV 
;EZI2IX 
;EZI2IX  
8*8
%YXS8LIXE  %YXS)87 
%YXS%6-1%  7IEWSREP2EMZI 
%YXS)87  %YXS8LIXE 
7IEWSREP2EMZI  %YXS%6-1% 
2EMZI  2EMZI 
               
%ZK6ERO ;50 %ZK6ERO 1%7)

Figure 18: Average rank of different models on Benchmark I, comprising 15 datasets also included in the training
data of Chronos models.

0SGEP1SHIPW 8EWO7TIGMJMG1SHIPW 4VIXVEMRIH1SHIPW >IVS7LSX

4EXGL878  'LVSRSW8 0EVKI 


8*8  4EXGL878 
 (IIT%6 
'LVSRSW8 0EVKI
%YXS8LIXE 
(IIT%6 
'LVSRSW8 &EWI 
2&)%87 
2,M87 
2,M87 
2&)%87 
'LVSRSW8 &EWI  8*8 
%YXS8LIXE  %YXS)87 
1SHIP

1SHIP

'LVSRSW8 7QEPP  'LVSRSW8 7QEPP 


%YXS)87  +4887 
;EZI2IX  'LVSRSW+48 
(0MRIEV 
'LVSRSW+48 
'LVSRSW8 1MRM 
'LVSRSW8 1MRM 
%YXS%6-1% 
(0MRIEV 
;EZI2IX 
%YXS%6-1%  
7IEWSREP2EMZI
7IEWSREP2EMZI  2EMZI 
2EMZI  *SVIGEWX4*2 
                 
%ZK6ERO ;50 %ZK6ERO 1%7)

Figure 19: Average rank of different models on Benchmark II, comprising 27 datasets not seen by Chronos models
during training.

35
0SGEP1SHIPW 8EWO7TIGMJMG1SHIPW 4VIXVEMRIH1SHIPW >IVS7LSX

4EXGL878  4EXGL878 

2,M87  (IIT%6 

 ;EZI2IX 


2&)%87
2,M87 
(IIT%6 
2&)%87 
;EZI2IX 
(0MRIEV 

1SHIP

1SHIP
(0MRIEV
+4887 
8*8 
'LVSRSW87]RXL 
'LVSRSW87]RXL 7QEPP

7QEPP 8*8 
%YXS%6-1% 
%YXS%6-1% 
7IEWSREP 
2EMZI %YXS)87 
%YXS)87  7IEWSREP 
2EMZI
%YXS8LIXE  %YXS8LIXE 

             
%KK6IPEXMZI;50 %KK6IPEXMZI1%7)

(a) Benchmark I

0SGEP1SHIPW 8EWO7TIGMJMG1SHIPW 4VIXVEMRIH1SHIPW >IVS7LSX

8*8  4EXGL878 

 2,M87 


2,M87
2&)%87 
2&)%87 
(IIT%6 
4EXGL878 
8*8 
(IIT%6  
%YXS8LIXE

1SHIP

1SHIP

(0MRIEV (0MRIEV 

%YXS%6-1%  +4887 

'LVSRSW87]RXL  %YXS%6-1% 


7QEPP 'LVSRSW87]RXL 
%YXS8LIXE  7QEPP
;EZI2IX 
%YXS)87 
%YXS)87 
;EZI2IX  7IEWSREP 
2EMZI
7IEWSREP  
2EMZI *SVIGEWX4*2
           
%KK6IPEXMZI;50 %KK6IPEXMZI1%7)

(b) Benchmark II

Figure 20: Performance of Chronos-T5-Synth (Small), a Chronos model that was only trained on synthetic data,
on Benchmark I and II, against local and task-specific models. Note that unlike other Chronos models also trained
on real data, both these benchmarks are zero-shot for Chronos-T5-Synth (Small).

36
,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ ,WTZSI9WZYM 2JINFS+TWJHFXY 
.SYJW[FQ
,WTZSI9WZYM&7 ,WTZSI9WZYM&7
28*  28*

 
 
         
&7 \NYMHTWWJHYTWIJW &7 \NYMHTWWJHYTWIJW
28*  28*

 
 
         
&ZYT&7.2& &ZYT&7.2&
28*  28*

 
 
         
(MWTSTX9 'FXJ (MWTSTX9 'FXJ
28*  28*

 
 
         

(a) AR(2) (b) AR(3)

Figure 21: Forecasts generated by Chronos-T5 (Base) for time series generated from AR(2) and AR(3) processes
compared against forecasts generated by the ground truth AR model, a fitted AR model of the correct order, and
an AutoARIMA model. Chronos-T5 (Base) generates plausible forecasts and prediction intervals in both cases. All
AR models fit the simpler AR(2) process well and obtain better MSE than Chronos-T5 (Base); however, with the
increased complexity in the AR(3) process, Chronos-T5 (Base) performs better than other models.

37
+VSYRH8VYXL 1IHMER*SVIGEWX  -RXIVZEP
%YWXVEPMER)PIGXVMGMX] '-*



 







                 
'EV4EVXW 'SZMH(IEXLW
 
 







 
           
(SQMRMGO )6'380SEH


 
 



 

 
   
            
)88 1MR )88 ,SYVP]



 



 

               
)PIGXVMGMX] 1MR )PIGXVMGMX] ,SYVP]






 




                 
)PIGXVMGMX] ;IIOP] )\GLERKI6EXI





 




            
*6)(1( ,SWTMXEP

 








                 

Figure 22: Example of forecasts from Chronos-T5 (Base) on the test datasets used in experiments.

38
+VSYRH8VYXL 1IHMER*SVIGEWX  -RXIVZEP
/(('YT 0SRHSR7QEVX1IXIVW


 
 
 
 

         
1 1SRXLP] 1 5YEVXIVP]
 

 

 

 

             
I 1 =IEVP] 1 1SRXLP]




 




 
            
1 5YEVXIVP] 1 =IEVP]

 

 






                  
1 (EMP] 1 ,SYVP]



 
 
 




   
1 1SRXLP] 1 5YEVXIVP]
 
 
 

 

 

 

             
1 ;IIOP] 1 =IEVP]

 



 


            

Figure 23: Example of forecasts from Chronos-T5 (Base) on the test datasets used in experiments.

39
+VSYRH8VYXL 1IHMER*SVIGEWX  -RXIVZEP
1 22 (EMP]
 

 

 






           
22 ;IIOP] 4IHIWXVMER'SYRXW



 



 
        
6MHIWLEVI 8E\M 1MR
 







 
 
                
8IQTIVEXYVI6EMR 8SYVMWQ 1SRXLP]







           
8SYVMWQ 5YEVXIVP] 8SYVMWQ =IEVP]

 






 
              
8VEJJMG 9FIV80' (EMP]



 
 
 
 
 
               

9FIV80' ,SYVP] ;IEXLIV
 






 

 
            

Figure 24: Example of forecasts from Chronos-T5 (Base) on the test datasets used in experiments.

40

You might also like