SSRN Id3271970

Exchange rate predictability and dynamic Bayesian
learning
Joscha Beckmann∗ Gary Koop†
University of Greifswald University of Strathclyde
Kiel Institute for the World Economy
Dimitris Korobilis‡ Rainer Schüssler§

University of Glasgow University of Rostock
July 29, 2019
Abstract
This paper considers how an investor in the foreign exchange market can exploit
predictive information by means of flexible Bayesian inference. Using a variety of
different vector autoregressive models, the investor is able, each period, to revise
past predictive mistakes and learn about important data features. The proposed
methodology is developed in order to synthesize a wide array of established approaches
for modelling exchange rate dynamics. In a thorough investigation of monthly exchange
rate predictability for ten countries, we find that an investor using the proposed
flexible methodology for dynamic asset allocation achieves significant economic gains
out of sample relative to benchmark strategies. In particular, we find strong evidence
for sparsity, fast model switching and exploiting the exchange rate cross-section.
Keywords: Exchange rates; Bayesian vector autoregression; Forecasting; Dynamic

portfolio allocation; Economic fundamentals
JEL Classification: C11; G11; G12; G15; G17; F31
∗
Department of Economics, University of Greifswald, Friedrich-Loeffler-Straße 70, 17489 Greifswald.
E: [email protected]
†
Department of Economics, University of Strathclyde, 199 Cathedral Street, Glasgow, G4 0QU.
‡
Adam Smith Business School, University of Glasgow, Glasgow G12 8QQ.
§
Department of Economics and Social Sciences, University of Rostock, Ulmenstrasse 69, 18057 Rostock.
E:[email protected]
Electronic copy available at: https://ssrn.com/abstract=3271970

1 Introduction
Understanding and predicting the evolution of exchange rates has long been a key component
of the research agenda in international economics and finance. Yet, the early finding by
Meese and Rogoff (1983) that structural models cannot offer predictability superior to that
of a random walk has not been convincingly overturned. The voluminous existing literature
on exchange rate forecasting, surveyed in Rossi (2013), adopts many different econometric
methods. Broadly speaking, these differences fall in the following categories. First, they
differ in whether they are multivariate (e.g. building a Vector Autoregressive, VAR, model
involving a cross-section of exchange rates for many countries) or univariate. Second, they
differ in which predictors they use. Third, they differ in how they treat the fact that there
may be many potential predictors, most of which are unimportant. Fourth, they differ in
whether they allow for dynamic model change (i.e. whether the best forecasting model can
involve different predictors at different points in time) or not. Fifth, they differ in whether
they allow for parameter change (both in VAR or regression coefficients and in volatilities) or
not.
We develop an econometric approach that allows for a general treatment of each of these
five categories. That is, its most flexible specification is a high-dimensional multivariate time
series model involving the full cross-section of exchange rates, several exogenous predictors
and time-variation in coefficients and volatilities. But our algorithm allows for decisions
relating to these categories to be made in a data-based fashion using dynamic model selection
methods. That is, the estimation procedure automatically decides whether to set a coefficient
on a predictor or a VAR lag to be zero (or not). Most importantly, it does so in a dynamic
manner, allowing for different forecasting models to be used at different points in time. Thus,
decisions about specification choices (i.e. different predictors, different VARs, different
degrees of model switching) are all made automatically in a time-varying fashion.
Our econometric approach is related to papers such as, among others, Koop and Korobilis
(2013) and Giannone, Lenza, and Primiceri (2015), which provide strategies for handling
prior elicitation and dynamic uncertainty in VAR models. We improve on and extend
them in important directions of relevance for our empirical application. These include in
particular rich shrinkage patterns provided by a Minnesota-type prior and flexible treatment
of exogenous regressors.

Our framework enables us to assess the relative contributions of different modelling
aspects in an exchange rate forecasting exercise involving 10 countries and exogenous
predictors. We take the view of a Bayesian investor with a broad perspective, accommodating
many features inspired by the exchange rate literature. Hence, our focus is on the economic
evaluation of the density forecasts generated by our approach. That said, given the wealth
of empirical results provided as a byproduct, we explore them to relate our findings to
previously documented characteristics and phenomena of exchange rate behaviour.
To preview our empirical results, we do find that model switching has a big role to
play. At most points in time only one or a few predictors are relevant for forecasting. In
general, the best economic results are achieved when both VAR lags and fundamentals are
considered in the candidate models though we find that VAR lags and fundamentals act as
substitutes to a certain extent in this regard. But there are also several periods where a
simple multivariate random walk with stochastic volatility is the best forecasting model.
We find an investor using our algorithm would experience substantial economic gains out
of sample relative to the random walk model with time-varying volatility. A risk-averse
mean-variance investor is willing to pay an annualized fee of several hundred basis points
(after transaction costs) for switching from the dynamic portfolio strategy implied by the
random walk with constant volatility model to the dynamic asset allocation implied by our
VAR-based approach. Similarly, we find that the annualized Sharpe ratio after transaction
costs increases substantially from adopting our approach.
The remainder of the paper is organized as follows. Section 2 relates our modelling
strategy to the literature. Section 3 discusses the data while Section 4 lays out our
econometric methods. Section 5 presents and discusses our empirical results and Section
6 concludes. We present technical details of our econometric methods along with many
empirical results and further details regarding the underlying data in an online appendix.
2 Relation to the literature

We provide a selective review of the literature, with a focus on the five econometric
modelling issues described in the Introduction.1 A large part of the existing literature relies
1
A thorough review of the voluminous literature on exchange rate predictability can be found in Rossi
(2013).

on macroeconomic fundamentals to forecast exchange rates with little success, which is
commonly referred to as the exchange rate disconnect puzzle.
The scapegoat approach of Bacchetta and van Wincoop (2004) attributes this failure to
the fact that market participants attach excessive weight to observable fundamentals that
deviate from their long-run trend. As a result, agents quickly switch between models over
time and different fundamentals may be relevant only for short periods. This explanation
translates into an econometric model that should allow for the optimal forecasting model to
change over time.2
Next is the issue of whether parameter change and other nonlinearities are beneficial for
forecasting. Overall, the evidence is not strong (Rossi, 2013), although several studies find
some benefits from allowing for time-variation in parameters; see, e.g., Rossi (2006) or Byrne,
Korobilis, and Ribeiro (2016). Our approach accommodates both constant and time-varying
parameters.
The question of whether there are benefits in working with a multivariate time series
model such as a VAR involving a cross-section of exchange rates is also debated. Such an
approach has the advantage that it exploits information in the co-movements and common
dynamics in exchange rates. There is some evidence that doing so can improve exchange rate
forecasts. Carriero, Kapetanios, and Marcellino (2009) work with a large Bayesian VAR
involving a cross-section of exchange rates and find forecast improvements from considering
dynamic comovements of exchange rates. Abbate and Marcellino (2018) extend Carriero,
Kapetanios, and Marcellino (2009) by allowing for, among other things, time-varying
coefficients and volatilities and find the latter to be particularly useful in improving forecast
performance. These considerations suggest that working with VARs with time-varying
volatilities is potentially important and our modelling approach does so.
Another issue which arises when we have many potential predictors is the need for some
method for ensuring parsimony so as to avoid overfitting and poor out-of-sample results.
Indeed, even in univariate models, papers such as Ackermann, Pohl, and Schmedders (2016)
find parameter estimation error to be substantial and, hence, they use no predictors when
building a diversified FX portfolio. Instead they focus solely on exploiting volatility timing.
2
This dynamic model switching aspect has also been found to be of crucial importance in the empirical
exchange rate literature. Sarno and Valente (2009) discuss how the fact that there is evidence of a weak link
between in-sample fit and out-of-sample predictability complicates the choice of selecting an appropriate
model even if fundamentals contain valuable information about the path of the exchange rate.

However, several recent papers have used data reduction methods, priors or model averaging
methods to minimize overfitting concerns. Other techniques have been successfully used,
including elastic net shrinkage (Li, Tsiakas, and Wang, 2015), gradient boosting (Berge, 2014)
and model averaging/selection (Della Corte, Sarno, and Tsiakas, 2008; Della Corte and
Tsiakas, 2012; Kouwenberg, Markiewicz, Verhoeks, and Zwinkels, 2017). All these approaches
find sparsity to be an important modelling feature and, in particular, Kouwenberg, Markiewicz,
Verhoeks, and Zwinkels (2017) illustrate also the time-varying relevance of regressors in a
univariate framework. Our work corroborates these findings in a multivariate approach that
allows us to assess the incremental value of fundamentals in addition to VAR lags and
vice versa. From an investor’s point of view, our multivariate approach allows for directly
mapping the (density) forecasts into portfolio weights without having to rely on additional
procedures as is the case for univariate approaches.
The literature has also explored the implications of exchange rate predictability (or a lack
thereof) for an investor wishing to build an investment portfolio involving various exchange
rates; see, for instance, Abhyankar, Sarno, and Valente (2005), Della Corte, Sarno, and
Tsiakas (2008), Kouwenberg, Markiewicz, Verhoeks, and Zwinkels (2017) and Abbate and
Marcellino (2018).
Motivated by these considerations, our econometric approach takes the perspective of an
investor who learns from past mistakes. We formalize this setting econometrically using the
notion of dynamic Bayesian learning. In it, the investor can adapt to a new forecasting
environment each time period by switching to a new model. The decision to switch is based
on past forecast errors. The result is an extremely flexible framework that learns quickly
from recent forecast performance. Our empirical framework has several desirable features.
First, due to the specification of time-varying parameters and dynamic model switching, the
VAR forecasting model can adapt to abrupt structural changes or sudden shifts in the
investor’s information set. Our estimation methods are Bayesian so that the investor’s
decisions account for parameter uncertainty. At the same time Bayesian methods offer
a natural setting for imposing statistical shrinkage which, as discussed above, has been
shown to be important for exchange rate predictability when working with large numbers
of predictors and a large cross-section of exchange rates. Finally, it is worth mentioning
that we allow for model incompleteness; see, e.g., Billio, Casarin, Ravazzolo, and van Dijk
(2013). That is, we do not assume that one of our entertained VARs reflects the correct data

generating process. The online appendix contains a small simulation experiment which
outlines how model incompleteness is accommodated.
3 Data
All of our individual model configurations are VARs (or extensions thereof) which involve a
cross-section of exchange rates as dependent variables. Some models also include additional
exogenous predictors. We use the common set of G10 currencies: the Australian dollar
(AUD), the Canadian dollar (CAD), the Euro (EUR)3 , the Japanese yen (JPY), the New
Zealand dollar (NZD), the Norwegian krone (NOK), the Swedish krona (SWK), the Swiss
franc (SWF), the Great Britain pound sterling (GBP) and the US dollar (USD). All
currencies are expressed in terms of the US dollar and are end-of-month exchange rates which
enter the model as discrete returns. Thus, we have nine exchange rates, each relative to the
US dollar, entering our VAR. The sample runs from 1986:01 until 2016:12. As additional
predictors, we also include the Uncovered Interest Parity (UIP), the percentage change in
stock prices over the past 12 months (STOCK GROWTH), the difference between long and
short term interest rates (INT DIFF) and the percentage change in the nominal oil price
(OIL). UIP, STOCK GROWTH and INT DIFF have been widely used in studies such as
Wright (2008) and previous research shows that US dollar exchange rates are affected by
the price of oil (Lizardo and Mollick, 2010). With regards to the interest rates, we use
one-month LIBOR and Eurodeposit interest rates as well as 10 Year government bonds.
In the online appendix, we present empirical results with a longer sample of data going
back to 1973 and also provide results for additional established predictors, such as purchasing
power parity, the monetary model and the taylor rule approach, which are not available in
real time. These results mainly reinforce the findings presented below. We focus on the
shorter sample since it covers a period where all exchange rates are largely freely floating and
availability of predictors is not a concern. The 1970s included several periods of economic
turbulence, such as the oil price shock and changes in exchange rate arrangements for some
currencies such as Sweden or Norway. In addition, perfectly comparable interest rates are
not available.
The forecast evaluation period runs from 1996:01 to 2016:12 for a total of 252 observations.
3
We use German instead of Euro data prior to 1999.

The online appendix provides sources, descriptions of the fundamental exchange rate models,
other details about the data and results for the long sample.
4 Econometric methods
4.1 The VAR
Our starting point is a time-varying parameter VAR with exogenous variables:
y t = xt β t + εt , εt ∼ N (0, Σt ) (1)
βt+1 = βt + ut , ut ∼ N (0, Ωt ) , (2)
where yt is an M × 1 vector containing observations on M time series variables (in our

case, discrete exchange-rate returns for nine countries). xt is a matrix where each row
contains predetermined variables in each VAR equation, namely an intercept, (lagged)
exogenous variables, and p lags of each of the M variables. We divide the set of exogenous
variables into two groups: Nx denotes the number of variables which are asset specific and
considered as relevant only for a specific exchange rate. For instance, in the equation for the
UK currency the UIP for the UK belongs in this class. Nxx denotes the number of non
asset-specific variables which are supposed to be potentially relevant for all currencies in the
setting (e.g. oil price changes). Thus, we have, k = M (1 + p · M + Nx + Nxx ) elements in
βt . Following a large literature in economics and finance4 we assume that βt evolves as a
multivariate random walk without drift, with covariance matrix Ωt of dimension k × k.
Here we outline our methods for estimating and forecasting with a single VAR. Additional
details are given in the online appendix. We require a prior for the initial condition for the
time-varying VAR coefficients. In the case of the constant-coefficient VAR, this is the prior
for the VAR coefficients. We use a variant of the Minnesota prior:5
β0 ∼ N (0, Ω0 ) .
4
See, e.g., Byrne, Korobilis, and Ribeiro (2016) or Dangl and Halling (2012).
5
This is the most popular prior for Bayesian VARs with Banbura, Giannone and Reichlin (2010) being an
early example of its use with a large Bayesian VARs and Koop and Korobilis (2013) using it with large
TVP-VARs.

Hence, model coefficients are initialized with an expected value of 0 and covariance matrix
Ω0 . If the diagonal elements of Ω0 are chosen to be small, the respective coefficients are
shrunk to 0. We employ this mechanism to effectively exclude certain exogenous variables in
some model configurations. The Minnesota prior assumes the prior covariance matrix Ω0 to
be diagonal. Let Ω0,i denote the block of Ω0 associated with the coefficients in equation i
and Ω0,i,jj its diagonal elements. The shrinkage intensity towards 0 is determined by the
hyperparameters γ. We assume a prior covariance matrix of the form:




 s2i γ1 for intercepts

 γ2



 r2
for coefficients on own lag r = 1, ..., p

 γ3 s2i



 r2 s2j
for coefficients on lag r of variable i 6= j for r = 1, ..., p


γ4 s2i



 for coefficients on the first asset-specific exogenous variable





 associated with exchange rate i

Ω0,i,jj = ... ...


γNx +3 s2i




 for coefficients on the last asset-specific exogenous variable


associated with exchange rate i






γNx +4 s2i for coefficients on the first non asset-specific exogenous variable






...






 γN +N +3 s2

for coefficients on the last non asset-specific exogenous variable
x xx i
(3)
s2i denotes the residual variance of the respective variable i. We set lag length p = 6.
The Minnesota prior is typically controlled by a single shrinkage parameter, see Bańbura,
Giannone, and Reichlin (2010) and citations therein. In order to deal with prior sensitivity
associated with selecting a particular value for this shrinkage parameter, Giannone, Lenza,
and Primiceri (2015) and Koop and Korobilis (2013) use information in the data to learn
about its value.We adopt a similar approach and allow the degree of shrinkage in the
Minnesota prior to adaptively change over time. Furthermore, by using the prior covariance
matrix specified in (3) we allow for richer shrinkage patterns. Instead of having one shrinkage
parameter for all VAR coefficients, we allow for multiple shrinkage parameters. For γ2 and
γ3 we consider values taken from the grid {0; 0.1; 0.5; 0.9}. We have an intercept, Nx = 3
asset-specific exogenous variables (UIP, STOCK GROWTH and INT DIFF) and Nxx = 1

non asset-specific exogenous variable (OIL). We standardize the exogenous variables in
real time to have means 0 and variances 1. For all of these, we use a grid of {0; 1}. All of
these choices for grids for γi reflect a desire to allow for the algorithm to select either a 0
(which means that the ith variable or block of variables is omitted from the model) or less
informative prior choices. Hence, our method allows for the exclusion of model elements such
as VAR lags or predictors if this is empirically warranted.
Assuming Σt and Ωt are known and the prior for β0 is as above, standard Bayesian
methods for state space models involving the Kalman filter can be used to estimate βt and
obtain the predictive distribution of the returns.
In practice, the econometrician/investor does not observe Σt and Ωt . In small models,
these parameters can be estimated with Markov Chain Monte Carlo (MCMC) methods using
approaches such as Chib, Nardari, and Shephard (2006). However, when working with larger
models MCMC methods become too computationally demanding. Accordingly, we rely
on exponential discounting methods. These are filtering methods in which Σt and Ωt are
updated by looking at recent data and discounting more distant observations at a higher rate.
Thus, if an abrupt change occurs, parameter estimates can adapt at a faster rate compared
to an investor who tracks parameters based on the whole, equally weighted, sample of data.
The mechanics behind the discounting approach are described in the online appendix.6
The key point to note here is that they involve the use of discount factors δ and λ to
control the dynamics of Σt and Ωt , respectively. These two discount factors control how
quickly/slowly investors learn from past forecasting performance. When δ = 1 (similarly for
λ), then the investor uses all available historical observations, equally weighted, to update
volatilities and parameters. For values less than one, older observations are exponentially
penalized, giving more weight to recent observations. As we work with monthly data, we
set δ = 0.97, following J.P. Morgan/Reuters (1996) and, hence, allowing for time-varying
observational volatilities and covariances. For our main results, we select constant slope
coefficients, setting λ = 1 and investigate the effects of time-varying slope parameters
(λ = 0.99) as a robustness check in our online appendix. The choice of constant slope
parameters for our main results is motivated by the empirical finding that in medium-sized
VARs such as ours, it is common to find strong evidence for time-varying error variances, but
6
Discount factors are well established; see the J.P. Morgan/Reuters (1996) Riskmetrics model, and
Dangl and Halling (2012) for an application in stock return predictability. For a general treatment, see West
and Harrison (1997).

little evidence in favor of time-varying VAR coefficients (Koop and Korobilis, 2013; Chan
and Eisenstat, 2018). Time-varying VAR coefficients may be even detrimental for portfolio
performance in the case of FX portfolios (Abbate and Marcellino, 2018). Using Wishart
Matrix discounting, we rely on a fully Bayesian approach for modelling the uncertainty
surrounded by point forecasts of volatilities and correlations. In our online appendix we
provide point forecasts of volatilities and correlations with credibility intervals.
In sum, Bayesian posterior and predictive inference of a single VAR can be done using
standard Kalman filtering and discounting methods. A VAR is defined by making particular
choices for δ, λ, γ1 , ..., γ7 to a particular value. Our proposed structure of the prior can also
be motivated as a spike-and-slab prior, a perspective we outline in the online appendix.
4.2 Dynamic model learning
Our empirical results fix δ and λ and consider a grid of values for each of γ1 , ..., γ7 to allow
for variable exclusion and different degrees of shrinkage intensity. If we consider every
possible combination of values taken from all of these grids we have 512 choices. We interpret
a choice as defining a model that the investor has at their disposal at each point in time
upon which they could base their portfolio allocation. In order to allow for the investor to
make an optimal choice each period t, we use the notion of dynamic model learning (DML).
Dynamic model learning involves selecting, at each point in time, the model specification
with the highest discounted joint log predictive likelihood at that time. The predictive
likelihood is a measure of out-of-sample forecasting ability that takes into account the entire
predictive distribution; see Geweke and Amisano (2012). The individual model configuration
with the highest discounted joint log predictive likelihood is used in order to obtain the
predictive mean and covariance matrix. These are a crucial input in portfolio optimization.
Our motivation for using learning based on past forecast performance is that it potentially
allows for a different model at each point in time. Such a feature is likely particularly useful
in times of abrupt change. If we were to use a single VAR, gradual parameter changes would
be accommodated if the discount factors δ and λ were below one. But this is not the same
as switching between entirely different models as dynamic model learning allows for.
In this dynamic model learning setting, the discounted joint predictive likelihood (DP L)

can be calculated as
t−1
Y αi
pj yt−i |y t−i−1

DP Lt|t−1,j = ,
i=1
where pj (yt−i |y t−i−1 ) denotes the predictive likelihood of model j in period i and t|t − 1
subscripts refer to estimates made of time-t quantities given information available at time
t − 1. Hence, model j will receive a higher value at a given point in time if it has forecast
well in the recent past, using the predictive likelihood (i.e., the predictive density evaluated
at the actual outcome) as the evaluation criterion. The interpretation of “recent past”
is controlled by the the discount factor α, reflecting exponential decay. For example, if
α = 0.95, forecast performance three years ago receives approximately 15% as much weight
as the forecast performance last period. If α = 0.90, then forecast performance three years
ago receives only about 2% as much weight. The case α = 1 implies no discounting and the
discounted predictive likelihood is then proportional to the marginal likelihood. Lower values
of α are associated with more rapid switching between models. We consider a range of values
for α and, at each point in time, choose the best value for it. In this way, we can allow for
times of fast model switching and times of slow model switching.
At time τ , we choose the best value for α as the one which has produced the model with
the highest product of predictive likelihoods7 in the past from t = 1, ..., τ . We consider the
following grid of values: α ∈ {0.50; 0.70; 0.80; 0.90; 0.99; 1}.
5 Empirical results
5.1 Evidence on model switching and sparsity
Our most flexible approach allows for dynamic model learning over a set of 512 different
VAR models and six different values of α using the methods described in Section 4. We
use the term “DML with ALL REGRESSORS” to denote the case where DML is being
done over all specification choices including all of the exogenous predictors. “DML without
own/cross lags and NO REGRESSORS” is the (heteroskedastic) random walk. We also
consider several restricted versions of DML which involves dynamic model learning over only
some of the predictors. “DML with OIL”, for example, means that OIL is the only possible
7
We stress that we are not using the DP L when choosing between different values for α. The DP L is
only used to select the best model for a given value of α.
10

exogenous variable variable which could be chosen. “DML without cross lags” means that
the coefficients on the cross lags are set to zero in all VARs. We implement such restrictions
by tuning the vector of shrinkage parameters γ1 , ..., γ7 introduced in equation (3). For
instance, to delete the effect of cross lags we set γ3 to zero. The label DML denotes the VAR
which involves only exchange rates (no exogenous predictors). We also consider versions of
our approach which set α to a specific value. For instance, DML (α = 0.99) means that α is
fixed at 0.99 rather than being selected from a grid of values.
The main focus of this paper is on how well these specifications perform in terms of our
dynamic asset allocation problem. However, before doing this, we present a few results
illustrating how the dynamic model learning strategy is working using the most flexible
specification.
Dynamic model learning is to be preferred over static Bayesian model learning only if the
optimal forecasting model is changing over time. Figure 1 shows that it does so in our
application. The vertical axis plots the model numbers from 1 to 512 against time for two
cases. The set of models begins with model number 1 which is the multivariate random walk
without drift and ends with model number 512 which is one of the most flexible models (i.e.
the VAR model with an intercept, own lags with shrinkage parameter γ2 = 0.9 and cross lags
with shrinkage parameter γ3 = 0.9, and with inclusion of all exogenous regressors).
The two lines in Figure 1 are for DML with ALL REGRESSORS (with α selected in a
time-varying manner) and DML with ALL REGRESSORS (with α fixed to 1). The latter
can be interpreted as allowing for model learning, but using conventional Bayesian model
averaging methods. Both cases show that different models are selected at different times.
But with our flexible specification where α is chosen in real time, the model change is
dramatic, suggesting that a high degree of model switching is a crucial feature. A wide
range of different models is selected with none emerging as dominant. Interestingly, the
multivariate random walk is selected 30.16% of the time. It is also evident that in certain
episodes in time, e.g. during the subprime crisis, flexible models are preferred.
The coloured diamonds in Figure 2 show which blocks of variables are included at each
point in time. In contrast, blank spaces in the graph depict the time-varying sparsity induced
by DML, that is, periods where a block of variables is not selected. There is no single block
of variables selected in all periods, however, in most cases, selection of a block persists for
several consecutive months before it becomes again irrelevant.
11

Figure 1: Frequency of Model Change
512
450
400
350
300
250
200
150
100
50
1996:01 2001:01 2006:01 2011:01 2016:01
The vertical axis represents the model configurations 1, ..., 512. The red line depicts the evolution of the
selected model configuration for α = 1. The grey line shows the evolution of the selected model configuration
when α is dynamically chosen from the grid of values α ∈ {0.50; 0.70; 0.8; 0.90; 0.99; 1}.
Figure 2: Inclusion of Blocks of Variables

OIL
INT DIFF
STOCK GROWTH
UIP
CROSS LAGS
OWN LAGS
INTERCEPT
1996:01 2001:01 2006:01 2011:01 2016:01
The figure displays which blocks of variables are included at each point in time. “Included” means the
respective γi is not 0.
5.2 Evaluation of economic utility and forecast performance
The previous sub-section establishes that the DML approach is picking up model change, but
we have not provided evidence whether this feature is relevant for dynamic portfolio choice.
To investigate this further, we design an international asset allocation strategy that involves
trading the US dollar and nine other currencies. We consider a US investor who builds a
portfolio by allocating their wealth between ten bonds: one domestic (US), and the nine
foreign bonds. In each period, the foreign bonds yield a riskless return in the local currency
12

and at the same time a risky return that is due to currency fluctuations relative to the US
dollar. Therefore, the only risk the US investor is exposed to is foreign exchange risk. Every
period the investor takes two steps. First, they use the currently selected model (i.e., the
model with the highest discounted sum of predictive likelihoods) to forecast the one-period
ahead exchange rate returns and the predictive covariance matrix. Second, using these
predictions, they dynamically rebalance their portfolio by calculating the new optimal
weights. This setup is designed to assess the economic value of exchange rate predictability
and to dissect which sources of information are valuable for asset allocation.
The dynamic asset allocation strategy is described in detail in the online appendix.
It involves choosing the investor’s degree of relative risk aversion θ. We set θ = 2 and
also consider θ = 6 in the online appendix an additional robustness check. It also takes
into account transaction costs, τ , ex ante (i.e., at the time of the portfolio construction).
Following Della Corte and Tsiakas (2012), we set τ = 0.0008. It also involves choosing a
target portfolio volatility, σp∗ , which we set to 10%. We assess the economic value of different
forecasting approaches by equating the utility generated by a portfolio strategy which is
based on our approach and the utility achieved by a portfolio strategy relying on a simple
random walk. The annualized performance fee an investor is willing to pay to switch from a
homoskedastic multivariate random walk to our approach is labelled ΦT C in the table below.
As an additional measure of economic utility, we report the Sharpe ratio before and after
transaction costs, SR and SRT C (benchmarked relative to the random walk).
The statistical criteria we use are the average joint predictive log likelihood (P LL),
coverage statistics of interval forecasts and the mean squared forecasting error (M SF E).
We report (P LL)-statistics in Table 1 and statistics of interval forecasts along with the
mean squared forecasting error relative to the random walk in our online appendix. Table 1
contains the results using our approach and the various restricted versions of it described
above.
Using DML we find the annualized performance fee after transaction costs is 327 basis
points and the annualized Sharpe ratio is 1.01 before transaction costs and 0.82 after
transaction costs. Including exogenous regressors into DML leads to substantially stronger
improvements when using the economic evaluation criteria. For instance, the annualized
performance fee after transactions costs increases to 397 when all the regressors are considered.
Among the exogenous regressors, including UIP leads to the largest improvements in the
13

Table 1: Evaluation of Forecasting Results
ΦT C SR SRT C P LL
DML with UIP 464∗ 1.12∗∗ 0.93∗∗ 22.01∗
Alternative sets of regressors
DML with OIL 199 0.89 0.70 22.03∗
DML with INT DIFF 388∗ 1.06∗ 0.88∗ 22.02∗
DML with STOCK GROWTH 368∗ 1.06∗ 0.88∗ 22.06∗
DML with ALL REGRESSORS 397∗ 1.02∗ 0.87∗ 22.04∗
DML with NO REGRESSORS 327 1.01∗ 0.82∗ 22.02∗
Type of restrictions: VAR lags
DML without own lags (γ2 = 0) and NO REGRESSORS 98 0.72 0.60 21.97
DML without cross lags (γ3 = 0) and NO REGRESSORS 200 0.86 0.79 21.78∗∗
DML without own/cross lags (γ2 = γ3 = 0) and NO REGRESSORS 5 0.54 0.53 21.72∗∗
DML without own/cross lags (γ2 = γ3 = 0) but with ALL REGRESSORS 255 0.80 0.73 22.00∗
Type of restrictions: Model selection dynamics
α=1 −427 0.34 0.11 21.69
α = 0.99 −464 0.28 0.08 21.66
α = 0.90 98 0.77 0.60 21.96∗
α = 0.80 266 0.94 0.76 22.02∗
α = 0.70 327 1.01∗ 0.82∗ 22.02∗
α = 0.50 84 0.82 0.60 21.98
The table summarizes the economic and statistical evaluation of our forecasts from different model configurations for the
period from 1996:01to 2016:12. We measure statistical significance for differences in performance fees and log scores using
the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard
errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility)
benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf
(2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of
Andrews and Monahan (1992). One star indicates significance at 10% level; two stars significance at 5% level; and three
stars significance at 1% level. Restrictions on α correspond to the specification DML with NO REGRESSORS.
economic performance measures. But, with the exception of OIL, the other regressors
also lead to improvements. Importance of VAR lags is identified at some points in time
and neglecting own lags or cross lags (i.e., setting γ2 /γ3 = 0) is detrimental for portfolio
performance. These patterns are in line with those in Figure 2. Most of these findings are
statistically significant relative to the homoskedastic multivariate random walk benchmark.
As noted above, we also repeated this exercise using a longer sample going back to 1973.
Using the same economic and statistical criteria, results for this longer sample (reported in
the online appendix) are qualitatively similar.
In terms of economic utility gains, our DML models compare very well to results reported
in the literature. Given our long evaluation period (252 observations in the ”short” sample
used in the main text and 324 in the ”long” sample reported in the online appendix) and
robustness to alternative specifications, this is good news for an investor. However, it is
also part of the story that the multivariate models we use involve estimating additional
parameters relative to univariate approaches. Enforcing stronger sparsity, we achieve lower
14

mean squared errors but also lower PLLs and economic gains as a result from narrowing the
model space to less parameter-rich configurations. This finding aligns with the results of
Cenesizoglu and Timmermann (2012) who report broad agreement between density forecast
measures and economic performance measures based on the predictive density. At the same
time, they note that there is typically a weak link between point forecast evaluation criteria
and economic evaluation criteria. Along these lines, this kind of disagreement in forecasting
measures is not uncommon and has also been documented for exchange rates (see, e.g.,
Fratzscher, Rime, Sarno, and Zinna (2015) or Abbate and Marcellino (2018)).8 An exception
in this respect is Kouwenberg, Markiewicz, Verhoeks, and Zwinkels (2017) who report
successful results also in terms of point forecasting accuracy, though at a quarterly data
frequency over a different evaluation period and exchange rates/fundamentals different from
ours.
From a statistical perspective it is worth noting that PLLs indicate that if one had to
select one single best model for the entire evaluation period, this would have been the
multivariate random walk with time-varying error covariance. The corresponding value for
the PLL is 21.72 and slightly above 22 for the DML configurations. DML, which allows for
the selected model to change over time, would choose the multivariate random walk with
time-varying error covariance roughly one third of the time and leads to higher PLLs and
substantially higher economic performance. This demonstrates that our flexible learning
mechanism is able to efficiently switch between different model configurations in real time.
Assessment based on the results of the long sample (see the online appendix) is comparable.
We next delineate the effect of restrictions on α. In practice, we estimate α to change
rapidly over time. The results in Table 1 relating to α show the benefits of this for forecasting.
Fixing α = 1 (or to other high values) rather than choosing the value of α in real time leads
to very poor forecasting results. Allowing for lower values of α and, thus, more model
switching leads to higher values of the log scores, and in particular, to higher performance
fees and Sharpe ratios. In fact, the highest performance fee and Sharpe ratio is obtained
when α = 0.70 for all presented model configurations. Note that the restrictions on α in
Table 1 are provided for DML with NO REGRESSORS.
Thus, large economic and statistical losses occur if the investor does not emphasize the
most recent forecast performance when selecting the forecasting model on which to base their
8
Della Corte, Sarno, and Tsiakas (2008) do not report measures of point forecasting accuracy.
15

asset allocation decision. Altogether, we are finding the choice of the discount factor α to be
a very important one.
The online appendix contains more empirical evidence that expands on and reinforces the
story told above. That is, as an econometric approach DML is performing well and, if used
to construct an investment portfolio, would yield higher levels of utility than a simple
benchmark. In particular, it presents results which show that the coverage of our predictive
densities is good and carries out a variety of robustness checks, including alternative prior
specifications and the use of different sets of predictors. We find the modelling approach
taken in this paper to be robust and better than other plausible specification or prior choices.
The online appendix also presents evidence against time-variation in the VAR coefficients. In
addition, results from the Giacomini and Rossi (2010) fluctuations test are provided and
several additional analyses and details with respect to portfolio performance are provided.
5.3 Market timing in high volatility periods
In this sub-section, we present additional empirical evidence to shed more light on when our
DML methods are performing well and provide some context to the existing theories of
exchange rate behaviour. All results are for DML with UIP which is (i) the most natural
regressor choice in the context of an economic evaluation and (ii) found to perform best in
the preceding sub-section. We also note that results using DML with ALL REGRESSORS
are very similar.
Brunnermeier, Nagel, and Pedersen (2008) document characteristic features of strategies
which consider investing in high-interest-rate currencies while borrowing in low-interest-rate
currencies. This delivers negatively skewed returns since the high-interest rate currencies are
prone to crash risk. Similarly, Menkhoff, Sarno, Schmeling, and Schrimpf (2012) find that
high-interest-rate currencies are negatively related to innovations in global FX volatility, and
thus deliver low returns in times of unexpected high volatility. Inspired by papers such as
these, we divide our currencies into those from countries with (on average) high interest rates
(AUD, NZD, NOK, GBP) and those from countries with (on average) low interest rates
(JPY, EUR, CHF, USD). As shown in the online appendix, the portfolio weights for these
groups of currencies vary substantially over time. To investigate patterns in this variation,
we regress the sum of the portfolio weights in the high-interest-rate currencies (P W ) on
16

FXVOL which is the JP Morgan index of currency volatility for the G10 countries and find
the following fitted regression line (t-statistics are in parentheses):
W = 2.10 −10.87 F XV OL.

Pd (4)
(7.93) (−5.15)
The message from this regression is clear cut: our DML with UIP strategy leads to portfolios
which include fewer of the high-interest-rate currencies in periods of high FX volatility thus
avoiding the crash risk associated with the carry trade strategies discussed in Brunnermeier,
Nagel, and Pedersen (2008) and Menkhoff, Sarno, Schmeling, and Schrimpf (2012).
There is also time-variation in the economic utility produced by our DML with UIP
approach relative to a random walk. If we regress the utility differences (∆U ) on FXVOL
(which relates specifically to currency markets), the VIX (which relates to stock markets)
and FXDIS (which is a measure of disagreement among professional forecasters)9 we find:
d = 0.0254 + 0.0033F XV OL +0.00014 V IX −0.0039 F XDIS.

∆U (5)
(−1.40) (2.49) (1.21) (−1.00)
This reinforces the story that DML with UIP is producing gains in utility particularly in
times of high volatility in currency markets, rather than financial markets as a whole or in
times of uncertainty for the professional forecasters.10
It is interesting how our findings relate to the scapegoat theory for which studies such as
Fratzscher, Rime, Sarno, and Zinna (2015) and Pozzi and Sadaba (2018) find empirical
support. Given our focus on developing and applying a method for out-of-sample forecasting,
the suggested approach cannot be used as a direct test of the scapegoat theory. As Fratzscher,
Rime, Sarno, and Zinna (2015) put it, ”the theory is silent on the role of scapegoats for
forecasting.” But in the general spirit of this theory and related work on the instability of the
relationship between exchange rates and fundamentals such as Bacchetta and van Wincoop
(2006), Bacchetta and van Wincoop (2013) or Markiewicz (2012), we note that in times of
high volatility in currency markets, our DML approaches tend to include more regressors and
VAR lags which potentially reflects an intensified search for scapegoats.11 And, as noted
9
Exact definitions and data sources are given in the online appendix.
10
The pronounced out-performance of DML strategies against the random walk (as a proxy of carry trade
strategies) in the time of the subprime crisis aligns with Fratzscher (2009) who finds that currencies in which
US investors held relatively large portfolio investments, experienced substantially larger depreciations against
the US dollar around 2008.
11
The Spearman correlation between the number of included regressors/VAR lags and FXVOL is
17

previously, we are finding a high degree of model switching and sparsity. These findings are
at least consistent with this literature if not a formal test of the theoretical models advanced
therein.12
6 Concluding remarks
We propose a multivariate forecasting approach for exchange rate returns which accommodates
flexible dynamic model change. Our dynamic Bayesian learning approach enables us
to quickly detect model changes and achieves computational feasibility by using decay
factors. A major conceptual advantage of our approach over univariate models is that, by
using a VAR-based approach, we obtain the input for the inherently multivariate portfolio
optimization problem in a natural manner without having to rely on additional assumptions
or ad-hoc procedures for mapping the forecast output into portfolio weights.
We evaluate the economic value of our exchange rate forecasts in a dynamic asset
allocation framework. Relying on our forecasting method, an investor achieves sizable
utility gains by exploiting time-varying predictability. We establish sparsity and fast model
switching as key features. Both align with the implications of the theoretical and empirical
exchange rate literature.
References
Abbate, A., and M. Marcellino (2018): “Point, interval and density forecasts of
exchange rates with time varying parameter models,” Journal of the Royal Statistical
Society: Series A (Statistics in Society), 181(1), 155–179.
Abhyankar, A., L. Sarno, and G. Valente (2005): “Exchange rates and fundamentals:
evidence on the economic value of predictability,” Journal of International Economics,
66(2), 325–348.
Ackermann, F., W. Pohl, and K. Schmedders (2016): “Optimal and naive

diversification in currency markets,” Management Science, 63(10), 3347–3360.
Andrews, D. W., and J. C. Monahan (1992): “An improved heteroskedasticity and

autocorrelation consistent covariance matrix estimator,” Econometrica: Journal of the
Econometric Society, pp. 953–966.
substantially positive, namely 0.41, for DML with UIP and similar for other models.
12
In line with our findings, Fratzscher, Rime, Sarno, and Zinna (2015) report out-performance of their
scapegoat model against the random walk in terms of economic measures in an out-of-sample forecasting
exercise, but not in terms of point forecasting accuracy.
18

Bacchetta, P., and E. van Wincoop (2004): “A scapegoat model of exchange-rate
fluctuations,” American Economic Review, 94(2), 114–118.
(2006): “Can information heterogeneity explain the exchange rate determination

puzzle?,” American Economic Review, 96(3), 552–576.
(2013): “On the unstable relationship between exchange rates and macroeconomic
fundamentals,” Journal of International Economics, 91(1), 18–26.
Bańbura, M., D. Giannone, and L. Reichlin (2010): “Large Bayesian vector auto
regressions,” Journal of Applied Econometrics, 25(1), 71–92.
Berge, T. J. (2014): “Forecasting disconnected exchange rates,” Journal of Applied

Econometrics, 29(5), 713–735.
Billio, M., R. Casarin, F. Ravazzolo, and H. van Dijk (2013): “Time-varying

combinations of predictive densities using nonlinear filtering,” Journal of Econometrics,
177(2), 213–232.
Brunnermeier, M. K., S. Nagel, and L. H. Pedersen (2008): “Carry trades and

currency crashes,” NBER Macroeconomics Annual, 23(1), 313–348.
Byrne, J. P., D. Korobilis, and P. J. Ribeiro (2016): “Exchange rate predictability in

a changing world,” Journal of International Money and Finance, 62, 1–24.
Carriero, A., G. Kapetanios, and M. Marcellino (2009): “Forecasting exchange

rates with a large Bayesian VAR,” International Journal of Forecasting, 25(2), 400–417.
Cenesizoglu, T., and A. Timmermann (2012): “Do return prediction models add
economic value?,” Journal of Banking & Finance, 36(11), 2974–2987.
Chan, J. C., and E. Eisenstat (2018): “Bayesian model comparison for time-varying
parameter VARs with stochastic volatility,” Journal of Applied Econometrics.
Chib, S., F. Nardari, and N. Shephard (2006): “Analysis of high dimensional

multivariate stochastic volatility models,” Journal of Econometrics, 134(2), 341–371.
Dangl, T., and M. Halling (2012): “Predictive regressions with time-varying coefficients,”
Journal of Financial Economics, 106(1), 157–181.
Della Corte, P., L. Sarno, and I. Tsiakas (2008): “An economic evaluation of
empirical exchange rate models,” The Review of Financial Studies, 22(9), 3491–3530.
Della Corte, P., and I. Tsiakas (2012): “Statistical and economic methods for
evaluating exchange rate predictability,” Handbook of exchange rates, pp. 221–263.
Diebold, F. X., and R. S. Mariano (1995): “Comparing predictive accuracy,” Journal

of Business & Economic Statistics, pp. 253–263.
Fratzscher, M. (2009): “What explains global exchange rate movements during the
financial crisis?,” Journal of International Money and Finance, 28(8), 1390–1407.
Fratzscher, M., D. Rime, L. Sarno, and G. Zinna (2015): “The scapegoat theory of
exchange rates: the first tests,” Journal of Monetary Economics, 70(C), 1–21.
19

Geweke, J., and G. Amisano (2012): “Prediction with misspecified models,” American
Economic Review, 102(3), 482–486.
Giacomini, R., and B. Rossi (2010): “Forecast comparisons in unstable environments,”
Journal of Applied Econometrics, 25(4), 595–620.
Giannone, D., M. Lenza, and G. E. Primiceri (2015): “Prior selection for vector
autoregressions,” The Review of Economics and Statistics, 97(2), 436–451.
J.P. Morgan/Reuters (1996): “RiskMetricsTM technical document,” Discussion paper.
Koop, G., and D. Korobilis (2013): “Large time-varying parameter VARs,” Journal of
Kouwenberg, R., A. Markiewicz, R. Verhoeks, and R. C. J. Zwinkels (2017):
“Model uncertainty and exchange rate forecasting,” Journal of Financial and Quantitative
Analysis, 52(01), 341–363.
Ledoit, O., and M. Wolf (2008): “Robust performance hypothesis testing with the
Sharpe ratio,” Journal of Empirical Finance, 15(5), 850–859.
Li, J., I. Tsiakas, and W. Wang (2015): “Predicting exchange rates out of sample: Can
economic fundamentals beat the random walk?,” Journal of Financial Econometrics,
13(2), 293–341.
Lizardo, R. A., and A. V. Mollick (2010): “Oil price fluctuations and US dollar
exchange rates,” Energy Economics, 32(2), 399–408.
Markiewicz, A. (2012): “Model uncertainty and exchange rate volatility,” International
Economic Review, 53(3), 815–844.
Meese, R. A., and K. Rogoff (1983): “Empirical exchange rate models of the seventies:
Do they fit out of sample?,” Journal of International Economics, 14(1), 3–24.
Menkhoff, L., L. Sarno, M. Schmeling, and A. Schrimpf (2012): “Carry trades and
global foreign exchange volatility,” The Journal of Finance, 67(2), 681–718.
Pozzi, L., and B. Sadaba (2018): “Detecting scapegoat effects in the relationship between
exchange rates and macroeconomic fundamentals: a new approach,” Macroeconomic
Dynamics, pp. 1–44.
Rossi, B. (2006): “Are exchange rates really random walks? Some evidence robust to
parameter instability,” Macroeconomic Dynamics, 10(1), 20–38.
(2013): “Exchange rate predictability,” Journal of Economic Literature, 51(4),
1063–1119.
Sarno, L., and G. Valente (2009): “Exchange rates and fundamentals: Footloose or
evolving relationship?,” Journal of the European Economic Association, 7(4), 786–830.
West, M., and J. Harrison (1997): Bayesian forecasting and dynamic models. Springer,
2nd edn.
Wright, J. H. (2008): “Bayesian Model Averaging and exchange rate forecasts,” Journal
of Econometrics, 146(2), 329–341.
20

Online appendix (not for publication) to
“Exchange rate predictability and dynamic
Bayesian learning”
Joscha Beckmann∗ Gary Koop†
University of Greifswald University of Strathclyde
Kiel Institute for the World Economy
Dimitris Korobilis‡ Rainer Schüssler§

University of Glasgow University of Rostock
July 29, 2019
Abstract
This online supplementary appendix presents technical details of our proposed
econometric methodology, simulation results not included in the paper, and
additional results using both our benchmark data set and a data set with more
time series observations.
Contents
1 Technical Appendix 2
1.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Spike-and-slab interpretation of the prior . . . . . . . . . . . . . . . . . . 4
1.3 Dynamic asset allocation and evaluation of economic utility . . . . . . . 6
1.3.1 Portfolio allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Evaluation of economic utility . . . . . . . . . . . . . . . . . . . . 7
1.4 Fundamental exchange rate models . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Fama regression/UIP . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Purchasing power parity . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.3 Monetary fundamentals . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.4 Taylor rule fundamentals . . . . . . . . . . . . . . . . . . . . . . . 10
∗
Department of Economics, University of Greifswald, Friedrich-Loeffler-Straße 70, 17489 Greifswald.
†
Department of Economics, University of Strathclyde, 199 Cathedral Street, Glasgow, G4 0QU.
‡
Adam Smith Business School, University of Glasgow, Glasgow G12 8QQ.
§
Department of Economics and Social Sciences, University of Rostock, Ulmenstrasse 69, 18057
Rostock.
E:[email protected]

2 Data Appendix 11
3 Simulation experiment: model incompleteness 11
4 Empirical Appendix 13
4.1 Point and interval forecasts . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Time-variation in performance . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 Evolution of wealth . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 Cumulative differences in log predictive likelihoods . . . . . . . . 17
4.2.3 Test statistics for Giacomini-Rossi Fluctuation test . . . . . . . . 17
4.3 Alternative sets of regressors . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Alternative priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4.1 ”Dense” prior structure . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4.2 VAR with tight prior . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4.3 Treating the exogenous variables as endogenous . . . . . . . . . . 22
4.5 Time-varying coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Additional results on portfolio performance . . . . . . . . . . . . . . . . . 23
4.6.1 Statistics of the portfolio results . . . . . . . . . . . . . . . . . . . 24
4.6.2 Evolution of portfolio weights . . . . . . . . . . . . . . . . . . . . 25
4.6.3 Restrictions on portfolio weights . . . . . . . . . . . . . . . . . . . 26
4.6.4 Global Harvest Index as benchmark . . . . . . . . . . . . . . . . . 26
4.6.5 Portfolio performance when removing one currency . . . . . . . . 28
4.6.6 Results for single currencies . . . . . . . . . . . . . . . . . . . . . 28
4.7 Additional robustness checks . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8 Results for the long sample . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References 36

1 Technical Appendix
1.1 Filtering
In this sub-section, we provide econometric details of our (TVP)-VARs. Filtered estimates

can be obtained using the fact that the form of the state space model implies
βt |y t−1 , Σt−1 ∼ N βt|t−1 , Ωt|t−1 ,

where t|t − 1 subscripts refer to estimates made of time-t quantities given information
available at time t − 1. Forecasts can be obtained using the fact that the predictive
density is multivariate t:
0

yt |y t−1 ∼ t ybt|t−1 , xt Ωt|t−1 xt + Qt|t−1 ,
where ybt|t−1 = xt βt|t−1 . Standard Kalman filtering and Wishart matrix discounting
formulas can be used to produce the quantities βt|t−1 , Ωt|t−1 and Qt|t−1 as follows.
Predictive step
The Kalman filter provides, beginning with β0|0 = 0 (see below), simple updating
formulas for producing βt|t−1 and βt|t for t = 1, ..., T which are standard and will not be
reproduced here. Given these we can produce point forecasts as:
ybt|t−1 = xt βt|t−1 .
To produce Ωt|t−1 we use a discount factor approximation involving a discount factor λ

and update as
1
Ωt|t−1 = Ωt−1|t−1 .
λ
Note that such an approximation is well established (West and Harrison, 1997) and, for
example, used by Koop and Korobilis (2013).
We select the discount factors in a data-adaptive fashion in real time. If λ < 1, the
VAR coefficients are time varying and a lower value of λ is associated with more rapidly
changing coefficients. If λ = 1, the special case of constant coefficients is obtained. An
advantage of the discount factor approach is that we do not have to update the entire
covariance matrix but instead only have to choose a single discount factor.

To retain conjugacy, Σt is modelled as Inverse Wishart (IW) with δnt−1 degrees of
freedom and scale matrix St−1 ,
Σt|t−1 ∼ IW (δnt−1 , St−1 ),
with the expected value
St−1
E Σt|t−1 := Qt|t−1 = .
δnt−1 + M − 1
Note that this density reflects the uncertainty about Σt and thus accounts for
parameter uncertainty. Low values of δ are associated with increasingly rapid changes in
the covariance matrix. Values near one are associated with slow adaptation, while δ = 1
represents the case of a constant covariance matrix Σ.
Update step
The error et is obtained as the difference between the point forecast ybt|t−1 and the
actual observation yt
et = yt − ybt|t−1 .
The observational covariance matrix is updated as
Σt|t ∼ IW (nt , St )
with the scale

0
0

St = k −1 St−1 + et et IM + Ft Ωt|t−1 xt ,
where
δ (1 − M ) + M
k −1 = ,
δ (2 − M ) + M − 1
using approximation results by Triantafyllopoulos (2011) exploiting the expectation

invariance of the random walk process for Σt : E Σt|t−1 = E Σt−1|t−1 . As is common
in the literature, the scale matrix is initialized as
 
b21
u
 
S0 =  ,
 
...
 
b2M
u

b2i , ..., u
where u b2M are the residuals from OLS estimation of a VAR over an initial training
sample. The updated degrees of freedom are obtained as
1
nt = δnt−1 + 1 (nt → n = ).
1−δ
It is a natural choice to initialize the degrees of freedom with
1
n0 = .
1−δ
The expected observational covariance is obtained as
St
E Σt|t := Qt|t = .
nt + M − 1
The time-t Kalman gain (KGt ) is obtained as
0
0
−1
KGt = Ωt|t−1 xt xt Ωt|t−1 xt + Qt|t .
Given the Kalman gain, the coefficients and the system covariance are updated as
βt|t = βt|t−1 + KGt et
and
Ωt|t = Ωt|t−1 + KGt xt Ωt|t−1 .
1.2 Spike-and-slab interpretation of the prior
Here we provide an interpretation of our proposed prior structure in the main paper as
a spike-and-slab prior. This is meant as an illustration of our prior structure from a
different angle. Note that the notation introduced in this sub-section only applies locally
and is not used elsewhere in the main text or the online appendix. Our starting point is
the same type of time-varying parameter VAR with exogenous variables we consider in
the main paper:
y t = xt β t + εt , εt ∼ N (0, Σt )
βt+1 = βt + ut , ut ∼ N (0, Ωt ) .

As in the main paper, we divide the set of exogenous variables into two groups: Nx
denotes the number of variables which are asset specific and considered as relevant only
for a specific exchange rate. Thus, we have, k = M (1 + p · M + Nx + Nxx ) elements in
βt .
The initial conditions for the time-varying VAR coefficients can be viewed as time
t = 0 priors for the parameters βt . For each coefficient in VAR equation i, i = 1, ..., M ,
and lag/predictor j, j = 1, ..., k/M , we use a variable selection prior of the form
β0,i,j ∼ ki,j,t N (0, Vi,j ) + (1 − ki,j,t )δ0
ki,j,t ∼ DM L,
where δ0 denotes the Dirac delta which assigns point mass at zero, and DM L denotes
dynamic model learning. Each indicator variable ki,j,t can take on a value of zero or one
in each time period. When ki,j,t = 1 the prior for β0,i,j is N (0, Vi,j ) and when ki,j,t = 0
the coefficient is exactly zero (and, hence, covariate j does not enter VAR equation i).
Whether ki,j,t is one or zero is decided probabilistically via the DML procedure.1 We make
the time-dependency of the ks explicit here, using subscript t. To streamline notation,
we do not use time-subscripts for the γs in the main text, although they are re-selected
each period.
We choose Vi,j , which contains the prior variances for the included coefficients, using
ideas from the Minnesota prior:



 γ1 for intercepts

 γ2
for coefficients on lag r of variable i (own lag)


 r2 s2i


Vi,j = s2i × r2γs32 for coefficients on lag r of endogenous variable k, k 6= i

 k
for coefficients on the lth asset-specific exogenous variable





 γ(3+l)

for coefficients on the mth non asset-specific exogenous variable

 γ
(Nx +3+m)
where r = 1, ..., p indexes lag-length, k = 1, ..., M indexes VAR equations, l = 1, ..., Nx

1
It would be possible to treat ki,j,t as unknown parameters and include them in the Bayesian posterior.
But these parameters are time-varying and directly drawing from them in an MCMC algorithm would
be computationally burdensome. This motivates our use of DML which uses discounting methods to
produce a computationally feasible approach. It is also worth noting that the selection indicators are
updated online. That is, as new data becomes available the investor only needs to input the latest
observation to update from ki,j,t to ki,j,t+1 .

indexes asset-specific predictors, m = 1, ..., Nxx indexes non asset-specific exogenous
predictors, while s2i denotes the OLS estimate of the residual variance of a univariate
AR(p) for variable i.
1.3 Dynamic asset allocation and evaluation of economic utility
1.3.1 Portfolio allocation
We design an international asset allocation strategy that involves trading the US dollar
and nine other currencies. Consider a US investor who builds a portfolio by allocating
their wealth between ten bonds: one domestic (US), and the nine foreign bonds. The US
bond return is rf . Define yt = (y1,t , ..., y9,t )0 . At each period, the foreign bonds yield a
riskless return in the local currency but a risky return due to currency fluctuations in US
dollars. The expectation of the risky return from the investment in country i0 s bonds,
ri,t , at time t − 1 is equal to Et−1 (ri,t ) = inti,t−1 + yi,t .2 The only risk the US investor
is exposed to is foreign exchange (FX) risk. Every period the investor takes two steps.
First, they use the currently selected model (i.e., the model with the highest discounted
sum of predictive likelihoods) to forecast the one-period ahead exchange rate returns
and the predictive covariance matrix. Second, using these predictions, they dynamically
rebalance their portfolio by calculating the new optimal weights. This setup is designed
to assess the economic value of exchange rate predictability and to dissect which sources
of information are valuable for asset allocation.
We evaluate our models within a dynamic mean-variance framework, implementing
a maximum expected return strategy. That is, we consider an investor who tries to find
the point on the efficient frontier with the highest possible (ex-ante) return, subject to
achieving a target conditional volatility and a given horizon of the investor (one-month
ahead for our main results). Define rt = (r1,t , ..., r9,t )0 , µt|t−1 = Et−1 (rt ) as its expectation.
The portfolio allocation problem involves choosing weights, wt = (w1,t , ..., w9,t )0 attached
to each of the 9 foreign bonds (with 1 − 9i=1 wi,t being the weight attached to the
P
2
We use yi,t , the discrete exchange rate returns, rather than log returns ∆st , as, in the context of
portfolio optimization, it is important to distinguish discrete and log returns.

domestic bond):

0 0 0 1 + rt
max µp,t|t−1 = wt µt|t−1 + (1 − wt ι)rf − τ ι wt − wt−1 ◦
wt 1 + rp,t
subject to
2 δnt−1 0

σp∗ = wt0
xt−1 Ωt|t−1 xt−1 + Qt|t−1 wt ,
δnt−1 − 2
| {z }
estimate of the predictive covariance matrix
2
where µp,t|t−1 is the conditional expected portfolio return and σp∗ the target portfolio
variance. ι is a vector of ones and the arguments of the predictive covariance matrix are
all produced by our estimation algorithm; see the Technical Appendix 1.1 for definitions.
We also here and below use notation where the portfolio return before transaction costs
is
0
0
Rp,t = 1 + rp,t−1 = 1 + 1 − wt−1 ι rf + wt−1 rt .
TC
In addition, we let Rp,t denote period-t gross return after transaction costs, τ .
Our specification of the portfolio allocation problem takes into account proportional
transaction costs, τ , ex ante (i.e., at the time of the portfolio construction).3 Following
Della Corte and Tsiakas (2012), we set τ = 0.0008. For our main results, we choose
σp∗ = 10% as target portfolio volatility of the conditional portfolio returns.
1.3.2 Evaluation of economic utility
Quadratic utility
Our econometric model provides forecasts of the mean vector of returns and the
covariance matrix. To assess the economic utility of the forecasts, we employ the method
proposed by West, Edison, and Cho (1993). In a mean-variance framework with quadratic
utility, we can express the investor’s realized utility in period t as
2
ρ 2 ρWt−1
U (Wt ) = Wt − Wt = Wt−1 Rp,t − (Rp,t )2 ,
2 2
where Wt is the investor’s wealth in t, ρ determines their risk preferences.
ρWt
The investor’s degree of relative risk aversion θt = 1−ρWt
is set to a constant value
θ. We choose θ = 2 for our main results (and θ = 6 for robustness checks). Then,
3
Maurer and Pezzo (2018) show the importance of treating transaction costs in FX portfolios ex ante
rather than ex post. Doing so avoids unnecessary trading and reduces transaction costs.

the average realized utility, U (·), can be employed to consistently estimate the expected
utility achieved by a given level of initial wealth (West, Edison, and Cho, 1993). With
initial wealth W0 , the average utility for an investor can be expressed as
(T −1 )
X
TC θ TC
2
U (·) = W0 Rp,t+1 − Rp,t+1 .
t=0
2 (1 + θ)
The advantage of the representation above is that, for a fixed value of θ, the relative risk
aversion is constant and utility is linearly homogenous in wealth. In contrast, for standard
quadratic utility without restrictions on θ, relative risk aversion would be increasing in
wealth, which is not likely to represent a typical investor’s preferences. Here, having
constant relative risk aversion, we can set W0 = $1.
Performance measures Our main evaluation criterion is based on the dynamic
mean-variance framework and quadratic utility. Comparing two competing forecasting
models involves comparing the average utilities generated by the respective forecasting
models. We assess the economic value of different forecasting approaches by equating the
average utility generated by a portfolio strategy which is based on (a particular version
of) the VAR approach and the average utility achieved by a portfolio strategy relying on
a simple random walk. Φ is the the maximum (monthly) performance fee an investor is
willing to pay to switch from the random walk to the specific VAR configuration. The
estimated value of Φ ensures that the following equation holds:
T −1 T −1
X T C,∗ TC
θ
T C,∗ TC
2 X
TC θ TC
2
Rp,t+1 − Φ − Rp,t+1 − Φ = Rp,t+1 − R ,
t=0
2 (1 + θ) t=0
2 (1 + θ) p,t+1
T C,∗
where Rp,t+1 is the gross portfolio return constructed using the expected return and
TC
covariance forecasts from the dynamically selected best model configuration and Rp,t+1
is implied by the benchmark random walk (without drift) model. The superscript TC
indicates that all quantities are computed after adjusting for transaction costs.
As a second measure of economic utility, we report the Sharpe ratio. Despite
its popularity as a risk measure, it is well known that the Sharpe ratio comes with
a few drawbacks in the context of evaluating dynamic portfolio strategies; see, for
example, Marquering and Verbeek (2004) or Han (2006). This is why we primarily
rely on performance fees as an evaluation criterion, while Sharpe ratios are reported as a

complementary measure.
1.4 Fundamental exchange rate models
This section defines the fundamental exchange rate models which are used in the paper.
One of these (UIP) is used in the main results in the body of th paper. The remainder
are used in this online appendix.
1.4.1 Fama regression/UIP
The UIP condition is the fundamental parity condition for foreign exchange market
efficiency under risk neutrality. This condition postulates that the difference in interest
rates between two countries should equal the expected change in exchange rates between
the countries’ currencies (Engel, 2013):
Et ∆st+1 = intt − int∗t ,
where ∆st+1 ≡ st+1 − st . Et ∆st+1 denotes the expected change (at time t for t + 1)
of log exchange rates, denominated as US dollar per foreign currency. intt (int∗t ) is
the one-period nominal interest rate US (foreign) securities. The following forecasting
equation arises under the assumption that Et ∆st+1 equals ∆st+1 , where st denotes the
log of realized exchange rates:
∆st+1 = intt − int∗t .
We use intt − int∗t as a predictor.
1.4.2 Purchasing power parity
Throughout the PPP literature, the real exchange rate is usually modelled as
qt = st − pt + p∗t ,
where qt is the log of the real exchange rate and pt (p∗t ) are the logs of the US (foreign)
price levels (Rogoff, 1996). PPP postulates a constant real exchange rate, resulting in

the price differential as the fundamental nominal exchange rate:
fP P P = (pt − p∗t )
and rely on current deviations from this exchange rate as a predictor for ∆st+1 , that is,
if PPP holds, we expect that ∆st+1 = (fP P P − st ) holds. Thus, we use fP P P − st as a
predictor.
1.4.3 Monetary fundamentals
The main feature of the monetary approach is that the exchange rate between two
countries is determined via the relative development of money supply and industrial
production (Dornbusch, 1976; Bilson, 1978). The underlying idea is that an increase in
the relative money supply depreciates the US dollar, while the opposite holds for relative
industrial production. A simplified version of the monetary approach adopted in previous
studies (Mark and Sul, 2001) can be expressed as
fM ON = (mt − m∗t ) − (ipt − ip∗t ),
where mt − m∗t denotes the (log) money supply and ipt − ip∗t refers to (log) industrial
production differentials. This implies ∆st+1 = fM ON − st and we use fM ON − st as a
predictor.
1.4.4 Taylor rule fundamentals
The Taylor rule states that a central bank adjusts the short-run nominal interest rate
in order to respond to inflation (π) and the output gap (ou). Postulating such Taylor
rules for two countries and subtracting one from the other, an equation is derived with
the interest rate differential on the left-hand side and the inflation and output gap on
the right-hand side.4 Provided that at least one of the two central banks also targets the
PPP level of the exchange rate, the real exchange rate also appears on the right-hand
side of the equation. The underlying idea is that both central banks follow a Taylor-rule
4
The output gap is approximated as the deviation of industrial production from trend output which
is calculated based on the Hodrick-Prescott filter with smoothing parameter λ = 14, 400. For estimating
the Hodrick-Prescott trend out of sample, we only use data that would have been available at the given
point in time.
10

model and determine the interest rate differential which drives the exchange rate. We
rely on a simple baseline specification with ad-hoc weights for inflation and output gap
which also incorporates the real exchange rate:
∆st+1 = 1.5(πt − πt∗ ) + 0.1(out − ou∗t ) + 0.1qt .
We use 1.5(πt − πt∗ ) + 0.1(out − ou∗t ) + 0.1qt as a predictor.
2 Data Appendix
Variable Source
Consumer prices, seasonally adjusted OECD
End-of-month dollar exchange rates Datastream
Industrial production and GDP, seasonally adjusted OCED
Money supply, seasonally adjusted OCED
LIBOR and Eurodeposit interest rates Datastream
FXDIS: Disagreement among exchange rate forecaster measured by standard deviation Consensus Economics
10 Year Government Bonds Datastream
CBOE Volatility Index (VIX) Federal Reserve
FXVOL: J.P. Morgan G10 currency volatility index Bloomberg
West Texas Intermediate Oil Price, denominated in US Dollar Federal Reserve
3 Simulation experiment: model incompleteness

Model incompleteness refers to the situation that all of the entertained individual
forecasting models are allowed to be false (Geweke and Amisano, 2011). Given the
complexity of exchange rate dynamics, we do not assume that any of our entertained
individual forecasting models reflects the true DGP. Rather, we consider a dynamic model
learning mechanism that switches among differently specified model configurations to
approximate the true DGP as closely as possible and acknowledge that the DGP changes
through time. Our implementation of the model learning strategy involves only one
parameter (α). It is thus parsimonious, limiting concerns about estimation error and
can be interpreted as a shortcut to approximate complex nonlinear behavior in a timely
manner. In this small simulation experiment we focus on the important aspect of our
model learning strategy to timely detect the most appropriate DGP among a set of models
11

in an incomplete model setting and changing DGPs.
We use a simulation setup considered in Billio, Casarin, Ravazzolo, and van Dijk
(2013) and generate a random sample from the following autoregressive model with
breaks:

yt = 0.1 + 0.3I(T0 ,T ] (t) + 0.6 − 0.4I(T0 ,T ] (t) yt−1 + εt ,
for t = 1, ..., T , with εt ∼ N (0, σ 2 ) , σ = 0.05, T0 = 50 and T = 100. I(z] (A) takes the
value 1 if z ∈ A and 0 otherwise. y0 = 0.25.
We apply our dynamic learning strategy to the following set of prediction models:
M1 : y1t = 0.1 + 0.6y1t−1 + ε1t
M2 : y2t = 0.4 + 0.2y2t−1 + ε2t
M3 : y3t = 0.9 + 0.1y3t−1 + ε3t
with εit ∼ N (0, σ 2 ) independent for i = 1, 2, 3 and assume yi0 = 0.25, i = 1, 2, 3 and
σ = 0.05. The model set is incomplete, but includes two models (M1 and M2 ) that are
equivalent versions of the true model in the two parts of the sample.
We apply our dynamic model learning strategy to the simulated data. That is, we
calculate the discounted predictive likelihood for each of the models (M1 , M2 and M3 )
and select the model (and value of the discount factor) which would have generated the
highest product of predictive likelihoods until the given point in time. As we do for our
application to exchange rate forecasting, we only consider information that would have
been available at a certain point in time. Instead of excluding dynamic learning by setting
α = 1, we choose the same range of the discount factor as we do in our application to
exchange rate forecasting: α ∈ {0.50; 0.70; 0.80; 0.90; 0.99; 1}. We simulated 1, 000 runs
and recorded how often each of the models was chosen at each point in time. Figure
1 presents the results. It shows that (i) in almost all cases the appropriate stochastic
process was selected, (ii) the structural break was recognized quickly and that (iii) model
3 rightly played no role.
12

Figure 1: Dynamic Model Learning
1000
800
600
400
200
0
0 10 20 30 40 50 60 70 80 90 100
The figure shows the number of simulation runs in which each of the three models was selected at each
point in time by the dynamic model learning strategy. The red line represents model 1, the grey line
model 2, and the purple line model 3.
4 Empirical Appendix
4.1 Point and interval forecasts
Bayesian methods provide the full predictive density, from which we can produce interval
and point forecasts as a byproduct. Although our primary interest is on exploiting density
forecasts for asset allocation, it is instructive to have a look at point and interval forecasts.
In particular, the second column of Table 1 shows the empirical coverage rates (for a
nominal coverage rate of 90%) for all currencies. These reveal good coverage properties,
albeit very slightly too conservative.
The third column of Table 1 reports the ratio of mean squared forecasting errors
relative to the simple random walk with constant volatility. Ratios below one indicate
better point forecasting performance in terms of squared loss of the DML with UIP
forecasts compared to those produced by the random walk. Our evidence on point
forecasting is ambiguous with some ratios below and some above one. This finding once
more shows how difficult it is to beat a simple random walk in terms of point forecasting
accuracy. On the other hand, our previous results show that it is more fruitful to focus
on density forecasts and exploit them for portfolio management.
Figure 2 plots point forecasts and credible intervals for each country in our sample
along with the realizations. The predictive credible intervals show good coverage. This
figure also illustrates, for every country, the importance of allowing for time-varying
13

volatilities, particularly around the time of the subprime crisis where volatility forecast
first increases significantly before gradually adjusting to the pre-crisis level.
As we a adopt a Wishart matrix discounting (WMD) approach for the error covariance
matrix, we are able to provide credibility intervals for our estimates of volatility and
correlations. Figure 3 presents the point estimates of annualized volatility along with
the 90% credibility intervals for the nine exchange rates. Figure 4 plots the point
estimates of correlations along with the 90% credibility intervals for four selected exchange
rates returns which display different patterns. The correlation between AUD and NZD
increases to almost one at the end of the sample which reflects the well-established co-
movements between these currencies. On the other hand, the intensity of the relationship
between JPY and GBP strongly decreases, potentially due to country-specific drivers of
the GBP exchange rate as a result of Brexit. Overall, these figures illustrate that this
dimension of model flexibility is able to capture relevant global currency dynamics.
Figure 2: Interval and Point Forecasts
AUD CAD
0.2 0.1 0.1 EUR
0 0
0
-0.1 -0.1
-0.2 -0.2 -0.2

1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
NOK
0.2 0.2 0.1
JPY NZD
0.1 0
0
0 -0.1
-0.1 -0.2 -0.2

1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
0.2 0.2 0.1 GBP

SWK CHF
0
0 0
-0.1
-0.2 -0.2 -0.2

1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
The figure shows the point forecast of exchange rate returns (red line) along with the 90% credibility
intervals (dark grey) of the DML with UIP strategy. The realized exchange rate returns are indicated in
light grey.
14

Figure 3: Volatility Forecasts and Credibility Intervals
AUD EUR
CAD
0.2 0.2
0.2
0.1 0.1
0.1
0 0
0
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
JPY NZD NOK
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
0.4 SWK 0.4 0.2 GBP
CHF
0.2 0.2 0.1
0 0 0
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
1996
2000
2005
2010
2015
The figure shows the point forecast of annualized volatility (red line) along with the 90% credibility
intervals (dark grey) of the DML with UIP strategy.
Figure 4: Correlation Forecasts and Credibility Intervals
AUD/NZD EUR/CAD
1 1
0.8
0.5
0.6
0.4
0
0.2
0 -0.5
1990
1995
2000
2005
2010
2015
1990
1995
2000
2005
2010
2015
JPY/GBP CHF/SWK
1 1
0.8
0.5
0.6
0
0.4
-0.5 0.2
1990
1995
2000
2005
2010
2015
1990
1995
2000
2005
2010
2015
The figure shows the point forecast of correlations (red line) along with the 90% credibility intervals (dark
grey) of the DML with UIP strategy.
15

Table 1: Evaluation of Interval and Point Forecasts
Currency Nominal coverage: 90% MSFE ratio

AUD 90% 1.03
CAD 88% 0.99
EUR 92% 1.06
JPY 93% 1.03
NZD 89% 0.98
NOK 91% 1.02
SWK 93% 0.99
CHF 93% 1.03
GBP 93% 0.97
The table summarizes the the coverage statistics of the interval forecasts and the relative point forecasting accuracy (MSFE
ratio) of the DML with UIP model against the random walk.
4.2 Time-variation in performance
4.2.1 Evolution of wealth
Figure 5 compares the evolution of wealth for an investor who begins with one dollar and
relies on DML with UIP to the wealth of an investor who uses a multivariate random
walk with constant covariance to construct their portfolio. As is evident from the figure,
the outperformance of DML with UIP is large, with the most striking gains around the
time of the subprime crisis.
16

Figure 5: Evolution of Wealth
14
Random walk
12 DML with UIP
10
0
1996:01 2001:01 2006:01 2011:01 2016:01
The figure depicts the evolution of wealth in the DML with UIP model and in the random walk model.
4.2.2 Cumulative differences in log predictive likelihoods
Figure 6 depicts the cumulative differences in predictive log likelihoods between DML
with UIP and the random walk (with constant volatility). The out-performance of DML
with UIP is most pronounced in the time of the subprime crisis.
4.2.3 Test statistics for Giacomini-Rossi Fluctuation test
The MSFE ratio is a measure of the global performance. It tells us whether the DML
with UIP or the random walk have given more precise point forecasts in a mean squared
error sense. However, we do not learn from this measure how the relative forecasting
power has evolved over time. As we seek to shed some light on the evolution through
time, we also provide a measure of local forecasting performance. A useful device for
exploring time-variation in forecasting performance is the Fluctuation test by Giacomini
and Rossi (2010). Figure 7 depicts (standardized) sequences of differences between the
MSFE of the random walk and the MSFE of the DML with UIP model computed over
rolling windows of 60 observations. Positive (negative) values of such differences indicate
17

Figure 6: Cumulative Differences in Predictive Log Likelihoods
350
300
250
200
150
100
50
-50
1996:01 2001:01 2006:01 2011:01 2016:01
The figure shows the cumulative differences in predictive log likelihoods between DML with UIP and the
random walk with constant volatility.
that DML with UIP forecasts better (worse) than the random walk. Figure 7 highlights
that the relative forecasting performance is highly unstable across currencies and over
time. This finding aligns with Rossi (2013). The standardized sequences of differences
between the MSFE of the random walk and the MSFE of the DML with UIP provide the
test statistics of the Fluctuation test. To carry out the Fluctuation test, i.e., testing the
null hypothesis that the local relative MSFE equals zero at each point in time, requires
computing critical values. Calculation of critical values in the Fluctuation test rests on
the assumption that a rolling or fixed estimation window has been used for generating
the out-of-sample forecasts. The out-of-sample forecasts in our setup were produced
using exponential discounting. Hence, we cannot compute valid critical values for our
application. This is, however, not a major concern since Figure 7 shows that the absolute
values of the test statistics are greater than two only for few currencies at very few points
in time. The null hypothesis of equal forecasting performance would thus essentially
never be rejected at conventional significance levels.
18

Figure 7: Fluctuation Test Statistics
AUD CAD EUR

2
2 2
0 0
0
-2 -2 -2
2001:01 2006:01 2011:01 2016:01 2001:01 2006:01 2011:01 2016:01 2001:01 2006:01 2011:01 2016:01
JPY NZD NOK
2 2 2
0 0 0
-2 -2 -2
2001:01 2006:01 2011:01 2016:01 2001:01 2006:01 2011:01 2016:01 2001:01 2006:01 2011:01 2016:01
SWK CHF GBP
2 2 2
0 0 0
-2 -2 -2
2001:01 2006:01 2011:01 2016:01 2001:01 2006:01 2011:01 2016:01 2001:01 2006:01 2011:01 2016:01
The figure shows Fluctuation test statistics through time. Positive values of the Fluctuation statistic
imply that DML with UIP does better than the random walk.
4.3 Alternative sets of regressors
For our main results we did not include some of the traditional regressors used by exchange
rate forecasters due to data revision concerns. But if we are willing to use final vintage
data (as opposed to data that forecasters would have had in real time), we can extend our
set of regressors to include purchasing power parity (PPP), the monetary model (MON)
and an asymmetric Taylor Rule (ASYTAY). The Technical Appendix 1.4 provides details
of what these are and how they are calculated.
Table 2 shows that including these fundamentals would not improve the performance
of an investor’s portfolio. Besides these conventional fundamentals, we also experimented
with yield curve factors which are commonly used to exploit the terms structure of interest
rates and the arising macroeconomic effects (Wright, 2011). This can be considered as an
extension of the simple interest rate spread. However, in line with Berge (2014) including
a level, slope and curvature factor does not improve our forecasts. The findings are
available upon request.
19

Table 2: Alternative Set of Regressors
ΦT C SR SRT C P LL
DML with UIP 464∗ 1.12∗ 0.94∗ 22.01∗
DML with PPP 217 0.90 0.71 22.00∗
DML with MON 251 0.94 0.75 22.03∗
DML with ASYTAY 263 0.95 0.76 22.02∗
DML with ALL REGRESSORS 332 0.98 0.80 22.01∗
DML 327 1.01∗ 0.82∗ 22.02∗
The table summarizes the economic and statistical evaluation of our forecasts from different model configurations for the
period from 1996:01to 2016:12. We measure statistical significance for differences in performance fees and log scores using
the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard
errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility)
benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf
(2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of
Andrews and Monahan (1992). One star indicates significance at 10% level; two stars significance at 5% level; and three
stars significance at 1% level.
4.4 Alternative priors
We have experimented with many prior specifications that lie within our DML framework
and, in particular, experimented with alternative choices of grids for the Minnesota
shrinkage parameters. Results were robust. If we use a refined grid for the values
of the hyperparameters, we find very slight forecast improvements (at the cost of
increasing the computation time). This shows that our specification of grid points for
the hyperparameters is sufficiently flexible to cover the model space. In this section, we
discuss some alternative, more restrictive, prior specifications. Overall, we find that the
rich shrinkage patterns we use pay off compared to more restrictive settings.
4.4.1 ”Dense” prior structure
In this sub-section, we discuss a prior structure that represents a ”dense” rather than
a ”sparse” modelling approach. We investigate how our results change when enforcing
a ”dense” prior rather than letting the data choose between a ”dense” and a ”sparse”
structure. A dense prior is one where VAR lags and exogenous regressors cannot be
removed from the model, instead only the degree of shrinkage intensity for each of the
(blocks) of variables is selected (i.e. the prior shrinkage parameters cannot be set to be
exactly zero as we do in our approach. We specify an alternative prior that features a
20

dense structure as described in the following paragraph.
For γ2 and γ3 , the shrinkage parameters for own and cross lags, we use grids of
{0.0001; 0.01; 0.1} and also the shrinkage parameter for UIP is estimated using a grid of
{0.0001; 0.01; 0.1}. We do not take into account other exogenous variables in this setting.
Table 3 summarizes the results for this alternative selection of grid points for the
shrinkage priors. It is evident that economic performance for the dense prior is inferior to
the prior used for our main results, while the PLLs are similar. If the grid point 0.0001
is removed as a grid point, performance deteriorates dramatically, with an annualized
performance of −289 basis points and considerably lower PLL (21.70).
Table 3: ”Dense” Prior Structure
ΦT C SR SRT C P LL
DML with UIP 285 1.00 0.78 22.00∗
DML (α = 1) −321 0.29 0.21 21.71∗
DML (α = 0.99) −163 0.53 0.34 21.72∗
DML (α = 0.90) 83 0.74 0.57 21.95∗
DML (α = 0.80) 189 0.92 0.71 22.00∗
DML (α = 0.70) 278 1.03 0.81 22.05∗
DML (α = 0.50) 231 0.96 0.74 22.03∗
The table summarizes the economic and statistical evaluation of our forecasts from the DML and restricted versions thereof
for the period from 1996:01 to 2016:12. We measure statistical significance for differences in performance fees and log
scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC)
standard errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant
volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit
and Wolf (2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator
of Andrews and Monahan (1992). One star indicates significance at 10% level; two stars significance at 5% level; and
three stars significance at 1% level.
4.4.2 VAR with tight prior
We also explored a VAR for the nine exchange rates without any exogenous regressors and
a very tight prior for the VAR coefficients. This setting is similar to Carriero, Kapetanios,
and Marcellino (2009). For γ2 and γ3 , the shrinkage parameters for own and cross lags,
we use grids of {10−4 ; 10−5 ; 10−6 }. Although we find that for seven out of nine exchange
rates the MSFE error is slightly lower than that of the random walk, results in terms of
density forecasting accuracy and economic measures are inferior to our baseline setting:
P LL = 21.73, ΦT C = 164, SR = 0.64 and SRT C = 0.62.
21

4.4.3 Treating the exogenous variables as endogenous
We also investigate how the results change if the exogenous variables are not treated as
such. Instead they are included as endogenous variables in the VAR. That is, instead of
working with a 9 variable VAR with exogenous variables, we work with a 37 dimensional
VAR involving the 9 exchange rates, 3 asset-specific variables, UIP, INT DIFF, STOCK
GROWTH, (i.e. there are 3 such variables for each of 9 countries, hence this adds 27
variables to the VAR) and 1 non-asset specific variable, OIL. With this much larger VAR
it is computationally infeasible to do a grid search over seven different prior shrinkage
parameters. Accordingly, we employ the framework proposed by Koop and Korobilis
(2013) which involves a single shrinkage parameter. We label this the KK-Minnesota-
prior. The strategy of using a single shrinkage parameter for imposing shrinkage on all
model parameters (except the intercept) is commonly used in the large Bayesian VAR
literature; see Giannone, Lenza, and Primiceri (2015), Koop and Korobilis (2013) and
Bańbura, Giannone, and Reichlin (2010). Following Koop and Korobilis (2013), the value
of the single shrinkage parameter γ is adaptively (in each time period) selected from the
grid γ ∈ {10−5 ; 0.001; 0.005; 0.01; 0.05; 0.1} and the shrinkage parameter of the intercepts
a is set to 100 to be uninformative. The structure of this simpler version of the Minnesota
prior is

 a · s2 , a = 100 for INTERCEPTS
i
Ω0,i,jj = .
 γ , γ ∈ {10−5 ; 0.001; 0.005; 0.01; 0.05; 0.1} for r = 1, ..., 6.
r2
Table 4 unambiguously conveys the message that the more restrictive structure of the
Koop and Korobilis (2013) framework is clearly inferior in this exchange rate forecasting
exercise compared to our proposed setting, both in statistical terms and even more so in
economic terms. This highlights that allowing for different degrees of prior shrinkage on
different blocks of parameters is empirically warranted.
4.5 Time-varying coefficients
In the preceding section, all of our VARs involved constant coefficients (but had time-
varying volatilities). Time-variation in VAR coefficients can easily be added, but leads to
inferior forecasting performance. To show this, we present results using a DML with
22

Table 4: KK-Minnesota-Prior
ΦT C SR SRT C P LL
DML with UIP −96 0.42 0.36 21.84
DML (α = 1) −201 0.33 0.30 21.63
DML (α = 0.99) −201 0.33 0.30 21.63
DML (α = 0.90) −287 0.29 0.23 21.79
DML (α = 0.80) −210 0.35 0.30 21.84
DML (α = 0.70) −116 0.43 0.38 21.86
DML (α = 0.50) −21 0.51 0.45 21.87
The table summarizes the economic and statistical evaluation of the KK-Minnesota-prior for the period from 1996:01
to 2016:12. We measure statistical significance for differences in performance fees and log scores using the (one-sided)
Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors. We evaluate
whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using
the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics
with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of Andrews and Monahan
(1992). One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1%
level.
UIP specification identical to that used in the preceding section except that it sets
λ = 0.99. Results are presented in Table 5. In comparison to the constant parameter
case (λ = 1) in our main results, we find that using time-varying VAR coefficients is in
general detrimental for forecasting performance, particularly when evaluating forecasts
in terms of the economic performance measures. We find strong evidence that allowing
for abrupt switching between different models for handling the evolving relationship
between exchange rates and fundamentals as highlighted by Sarno and Valente (2009).
But allowing for gradual change in parameters is not a useful addition. An exception
is the specification ”DML without own/cross lags but with ALL REGRESSORS”. In
this case time-varying parameters do not turn out to be detrimental. It appears that, in
specifications that involve estimation of many parameters for the VAR lags, time-variation
in parameters leads to lower performance. This finding aligns with the econometric
literature with respect to time-varying VAR parameters in medium-size VARs (Chan
and Eisenstat, 2018; Koop and Korobilis, 2013).
4.6 Additional results on portfolio performance
In this sub-section, we explore in greater detail the portfolio performance implied by our
flexible DML with UIP model and the portfolio performance based on the random walk.
23

Table 5: TVP-VAR
ΦT C SR SRT C P LL
DML with UIP 189 0.85 0.70 22.00∗
Alternative sets of regressors
DML with INT DIFF 114 0.77 0.62 21.98∗
DML with STOCK GROWTH 32 0.71 0.55 21.99
DML with OIL −20 0.64 0.49 21.95
DML with ALL REGRESSORS 202 0.83 0.69 21.99∗
DML with NO REGRESSORS 42 0.71 0.55 22.00∗
DML without own lags (γ2 = 0) and NO REGRESSORS −111 0.45 0.38 21.89∗
DML without cross lags (γ3 = 0) and NO REGRESSORS −107 0.55 0.39 21.82∗∗
DML without own/cross lags (γ2 = γ3 = 0) and NO REGRESSORS 5 0.54 0.53 21.72∗∗
DML without own/cross lags (γ2 = γ3 = 0) but with ALL REGRESSORS 289 0.81 0.75 22.00∗
DML (α = 1) −325 0.37 0.20 21.65
DML (α = 0.99) −247 0.44 0.27 21.67
DML (α = 0.90) 0 0.63 0.51 21.95
DML (α = 0.80) 66 0.70 0.56 21.97
DML (α = 0.70) 42 0.71 0.56 22.00∗
DML (α = 0.50) −29 0.65 0.48 21.98
The table summarizes the economic and statistical evaluation of our forecasts from the TVP-VAR for the period from
1996:01 to 2016:12. We measure statistical significance for differences in performance fees and log scores using the (one-
sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors.
Restrictions on α correspond to the specification DML with NO REGRESSORS. We evaluate whether the Sharpe ratio of
a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the)
Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-
robust variance, using the pre-whitened quadratic spectral estimator of Andrews and Monahan (1992). One star indicates
significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level.
4.6.1 Statistics of the portfolio results
Table 6 compares descriptive statistics of the portfolio performance based on the DML
with UIP model and the random walk. The mean return (measured at a monthly
frequency) is almost twice as high based on the DLM with UIP model than based on
√
the random walk. The annualized volatility of portfolio returns is 10.81% (3.12% × 12)
and is hence only slightly higher than the target portfolio volatility of 10%. Skewness of
portfolio returns is substantially higher based on DLM with UIP, while the kurtosis is
considerably lower than in the random walk case. Altogether, the portfolio characteristics
of the DML with UIP model are clearly superior to those of the random walk. In addition,
the characteristics of the portfolio returns based on the DML with UIP strategy are also
more favourable for risk management and diversification purposes: the correlation of
the returns to equities (proxied by S&P500 returns) is even negative and the first-order
autocorrelation of returns and squared returns is lower than in case of the portfolio returns
24

based on the random walk.
Table 6: Statistics of Portfolio Results
DML with UIP Random walk

Mean return (in %) 1.22 0.69
Volatility ( in %) 3.12 3.50
Skewness −0.15 −0.94
Kurtosis 3.18 5.18
Positive returns (>0 in %) 68 62
First-order autocorrelation of returns 0.06 0.12
First-order autocorrelation of squared returns 0.06 0.23
Correlation to S&P 500 returns −0.05 0.22
The table summarizes the portfolio results from the DML with UIP model and the random walk for the period from 1996:01
to 2016:12.
4.6.2 Evolution of portfolio weights
It is of interest how the portfolio weights have evolved through time. Figure 8 and
Figure 9 depict the evolution of portfolio weights for the bonds of the high-interest-rate
countries (NOK, NZD, AUD, GBP) and for the bonds of the low-interest-rate countries
(CHF, EUR, JPY, USD), respectively. The figures show that there is considerable
portfolio rebalancing over time. However, the implied portfolio weights are not excessive
and are hence implementable by an investor without imposing additional restrictions on
portfolio weights. Remember that our portfolio optimization exercises take into account
transaction costs ex ante and the results in Section 4.6.1 reveal that expected and realized
volatility of portfolio returns are close together. Figure 10 shows the average portfolio
weight for the bonds of low-interest-rate currencies and high-interest-rate currencies. The
pattern is not surprising: the average portfolio weights of the high-interest-rate currencies
are all positive, while the average portfolio weight of the low-interest-rate currencies JPY
and CHF is negative and only marginally positive for EUR. The largest average short
position has been the CHF, while the weight for the JPY is only moderately negative.
Interestingly, those patterns are in line with the findings by Ackermann, Pohl, and
Schmedders (2016). The clearly positive net position of the USD (+1.06) was to be
expected as the USD is the domestic currency in our portfolio setup and therefore short-
term USD is the risk-free asset. This means that, on average, long and short positions in
25

foreign bonds have been approximately zero.
Figure 8: Portfolio Weights for the Bonds of the High-Interest-Rate Countries
2.5
NOK
2 NZD
AUD
GBP
1.5
0.5
-0.5
-1
-1.5
1996:01 2001:01 2006:01 2011:01 2016:01
This figure shows the evolution of the portfolio weights for the bonds of the high-interest-rate currencies
based on the DML with UIP model.
4.6.3 Restrictions on portfolio weights
Table 7 summarizes the effect of restrictions on portfolio weights on economic utility and
the Sharpe ratio. Restricting the portfolio weights to [−1; 1] leads to even slightly better
portfolio performance than in the case where the portfolio weights are left unrestricted.
This is good news from a risk-management perspective since excessive portfolio weights
are not required to achieve high utility gains. However, severe restrictions on the portfolio
weights are clearly detrimental for portfolio performance.
4.6.4 Global Harvest Index as benchmark
We consider the Deutsche Bank Global Currency Harvest Index as an additional

benchmark strategy. This index can be seen as a proxy for carry trade returns as a
style strategy. Figure 11 shows that the wealth path generated by our random walk
model and the evolution of the Global Currency Harvest Index are broadly similar. The
correlation between the returns is 0.60. This result is not surprising, given that our
26

Figure 9: Portfolio Weights for the Bonds of the Low-Interest-Rate Countries
3.5
CHF
3
EUR
JPY
2.5
USD
1.5
0.5
-0.5
-1
-1.5
1996:01 2001:01 2006:01 2011:01 2016:01
This figure shows the evolution of the portfolio weights for the bonds of the low-interest-rate currencies
based on the DML with UIP model.
Table 7: Restrictions on Portfolio Weights
DML with UIP

Weight restriction ΦT C SR SRT C
No restriction 464∗ 1.12∗ 0.93∗
[−1; 1] 468∗ 1.13∗ 0.95∗
[−0.5; 0.5] 305 0.97 0.82
[−0.25; 0.25] −8 0.67 0.57
[−0.1; 0.1] −320 0.32 0.25
Equal weights −602 −0.11 −0.11
The table summarizes the effect of restrictions on the portfolio weights on economic utility and the Sharpe ratio.
random walk strategy and the strategy underlying the Global Currency Harvest Index
are both carry trade strategies, only differing with respect to implementation details.
The Global Currency Harvest Index does not start before 2000:09 and there are some
missing data observations which we imputed by (linear) interpolation. The findings are
essentially similar to the original carry trade strategy we consider and therefore leaves
our results unchanged.
27

Figure 10: Average Portfolio Weights
AUD
CAD
EUR
JPY
NZD
NOK
SWK
CHF
GBP
USD
-1 -0.5 0 0.5 1 1.5
This figure shows the average portfolio weights for the bonds based on the DML with UIP model.
4.6.5 Portfolio performance when removing one currency
To assess the sensitivity of the portfolio performance, we compute the Sharpe ratios when
we remove one currency from the set of currencies and set the respective portfolio weight
to 0. Table 8 shows that there is not one particular currency that drives the results. Not
surprisingly, enforcing dollar neutrality leads, in relative terms, to the largest decrease in
the Sharpe ratio.
4.6.6 Results for single currencies
We also analyze the case where only one foreign bond is considered for investment in
addition to the risk-less USD bond (from the perspective of a US investor). Table 9 reports
the results and once again sends the story that there does not emerge one particular
currency that leads to attractive portfolio results and reinforces our finding that market
28

Figure 11: Evolution of Wealth Relative to Carry Trade Strategies
DML with UIP

4.5 Global Currency Harvest
Random walk
3.5
2.5
1.5
0.5
2001:01 2006:01 2011:01 2016:01
The figure depicts the evolution of wealth for the DML with UIP strategy, the random walk model and the Global Currency
Harvest Index.
timing in a large set of currencies is key for economic utility gains.
4.7 Additional robustness checks
In this sub-section, we briefly mention a couple of additional specifications we considered.

Spillover effects
The first of these investigated whether spillover effects involving macroeconomic
fundamentals might be important. Such third-country effects have been discussed in
Berg and Mark (2015). For instance, instead of including only the UIP for the UK
in the equation for the UK currency (as we do), we can also include the UIPs for all
the other currencies as well. If we do this, results are not noticeably affected. Our
VAR specification allows spillovers between the exchange rates for different countries.
This kind of spillover we have found to improve forecasts. Adding spillovers involving
macroeconomic fundamentals results in no additional benefits.
Alternative measure of portfolio performance
29

Table 8: Removing Single Currencies
Removed currency SR SRT C

AU D 1.01∗ 0.84∗
CAD 1.09∗ 0.93∗
EU R 1.08∗ 0.91∗
JP Y 1.06∗ 0.88∗
N ZD 0.97∗ 0.80∗
N OK 1.10∗ 0.93∗
SW K 0.99∗ 0.83∗
CHF 1.09∗ 0.93∗
GBP 0.97∗ 0.80∗
U SD 0.90 0.71
The table reports the Sharpe ratios for the DML with UIP model if one particular is left out in each row. We evaluate
whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using
the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics
with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of Andrews and Monahan
(1992). One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1%
level.
Table 9: Single Currencies
Currency SR SRT C
AU D 0.33 0.29
CAD 0.17 0.12
EU R 0.26 0.21
JP Y 0.32 0.27
N ZD 0.38 0.34
N OK 0.02 0.01
SW K 0.22 0.19
CHF 0.04 0.01
GBP 0.20 0.16
The table summarizes the Sharpe ratios for the DML with UIP model if only one additional currency is considered for
investment in addition to the USD.
As an alternative performance measure we also investigated the manipulation-proof

performance measure proposed by Goetzmann, Ingersoll, Spiegel, and Welch (2007). The
30

advantage of this criterion is that we do not have to assume a particular utility function.
The results compared to the reported quadratic utility case are very similar and available
upon request.
Alternative risk aversion
It is of interest whether the economic utility gains can also be achieved by investors with
higher risk aversion. To explore this issue, we also considered the risk aversion coefficient
θ = 6 instead of θ = 2. For this case, we found even larger utility gains than in our
baseline setting, achieving an annualized performance fee of 525 basis points, that is
ΦT C = 525, for the DML with UIP model (464 basis points in the base case) .
Specific degrees of time variation for different blocks of coefficients
As discussed previously, we have found that working with a constant coefficient VAR
by setting λ = 1 leads to improved forecast performance over λ = 0.99. But these
specifications assume the same λ applies to all the VAR coefficients. It is theoretically
possible that, by allowing for different degrees of shrinkage for different blocks of
coefficients, forecast performance can be improved. In practice, we have done extensive
experimentation and have not found any forecast improvements by doing so.
Alternative grid for the decay factor α
We also considered a more refined grid for choosing α, namely α ∈ {0.40 : 0.01 : 1.00}.
In this case, α = 0.73 is selected from the data over the entire period and, hence, is quite
similar to our benchmark results (α = 0.70). The results for the refined grid are almost
exactly the same as in our base case.
4.8 Results for the long sample
In this sub-section, we report some additional key results for our long sample period
which starts in 1973:01 and for which we compute out-of-sample results from 1990:01 to
2016:12. Due to data availability we do not consider the inclusion of exogenous regressors
for this sample period.
Table 10 summarizes the results. As is the case for the short sample, DML
substantially outperforms the multivariate random walk (i.e. DML without own/cross
lags) both in terms of PLLs and economic criteria. Here again, fast model switching
is found to be crucially important for the accuracy of density forecasts and portfolio
allocation. The optimal decay factor α is found to be 0.80 over the entire evaluation
31

period and is thus comparable to the optimal decay factor of the short period (α = 0.70).
As for our short sample, we consider the G10 countries.
Figure 12 illustrates the high frequency of model change when the decay factor is
chosen from the data. The vertical axis plots the model numbers from 1 to 32 against time
for two cases. The set of models begins with model number 1 which is the multivariate
random walk without drift and ends with model number 32 which is one of the most
flexible models (i.e. the VAR model with an intercept, own lags with shrinkage parameter
γ2 = 0.9 and cross lags with shrinkage parameter γ3 = 0.9). The two lines in Figure 12 are
for DML (with α selected in a time-varying manner) and DML (α = 1). In our flexible
specification where the decay factor is chosen from the data, model change occurs much
more frequently than in the case when there is no discounting of forecasting performance
(α = 1). Many different models are selected over time. The individual specification which
is picked most frequently is the multivariate random walk (in approximately half of the
cases). This prominent role of the multivariate random walk reinforces the story that
sparsity is a key aspect. Figure 13 shows which blocks of variables are included at each
point in time (coloured diamonds). Blank spaces in the graph depict the time-varying
sparsity induced by DML, that is, periods where a block of variables is not selected.
Typically, we observe persistence in the selection of a block of variables.
Figure 14 compares the evolution of wealth for an investor who begins with one
dollar and relies on DML to the wealth of an investor who uses a multivariate random
walk with constant covariance to construct their portfolio. As for the short sample, the
outperformance of DML is large, with the most striking gains around the time of the
subprime crisis.
Overall, all key findings for the short sample period also apply for the long sample.
32

Table 10: Evaluation of Forecasting Results
ΦT C SR SRT C P LL
DML 485∗∗ 1.08∗∗ 0.92∗∗ 22.05
DML without own lags (γ2 = 0) 365∗ 0.82∗ 0.72∗ 21.86
DML without cross lags (γ3 = 0) 278 0.80 0.66 21.78
DML without own/cross lags (γ2 = γ3 = 0) 17 0.47 0.46 21.65
DML (α = 1) −255 0.35 0.19 21.65
DML (α = 0.99) −194 0.40 0.24 21.65
DML (α = 0.90) 238 0.78 0.65 21.88
DML (α = 0.80) 485∗∗ 1.08∗∗ 0.92∗∗ 22.05
DML (α = 0.70) 478∗ 1.07∗∗ 0.90∗ 22.06
DML (α = 0.50) 409∗ 1.04∗∗ 0.84∗∗ 22.05
The table summarizes the economic and statistical evaluation of our forecasts from the DML and restricted versions thereof
for the period from 1990:01 to 2016:12. We measure statistical significance for differences in performance fees and log
scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC)
standard errors. Restrictions on α correspond to the DML specification. We evaluate whether the Sharpe ratio of a
model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the)
Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-
robust variance, using the pre-whitened quadratic spectral estimator of Andrews and Monahan (1992). One star indicates
significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level.
33

Figure 12: Frequency of Model Change
30
25
20
15
10
1990:01 1995:01 2000:01 2005:01 2010:01 2015:01
The figure displays the frequency of model change over time using the long sample. The vertical axis
represents the model configurations 1, ..., 32. The red line depicts the evolution of the selected model
configuration for α = 1. The grey line shows the evolution of the selected model configuration when is
dynamically chosen from the grid of values α ∈ {0.50; 0.70; 0.80; 0.90; 0.99; 1}.
34

Figure 13: Inclusion of Blocks of Variables
CROSS LAGS
OWN LAGS
INTERCEPT
1990:01 1995:01 2000:01 2005:01 2010:01 2015:01
The figure displays which blocks of variables are included at each point in time. “Included” means the
respective γi is not 0.
Figure 14: Evolution of Wealth
30
DML
Random walk
25
20
15
10
0
1990:01 1995:01 2000:01 2005:01 2010:01 2015:01
The figure depicts the evolution of wealth in the DML model and the random walk model.
35

References
Ackermann, F., W. Pohl, and K. Schmedders (2016): “Optimal and naive
diversification in currency markets,” Management Science, 63(10), 3347–3360.
Andrews, D. W., and J. C. Monahan (1992): “An improved heteroskedasticity and

autocorrelation consistent covariance matrix estimator,” Econometrica: Journal of the
Econometric Society, pp. 953–966.
Bańbura, M., D. Giannone, and L. Reichlin (2010): “Large Bayesian vector auto
regressions,” Journal of Applied Econometrics, 25(1), 71–92.
Berg, K. A., and N. C. Mark (2015): “Third-country effects on the exchange rate,”
Journal of International Economics, 96(2), 227–243.
Berge, T. J. (2014): “Forecasting disconnected exchange rates,” Journal of Applied

Billio, M., R. Casarin, F. Ravazzolo, and H. van Dijk (2013): “Time-varying

combinations of predictive densities using nonlinear filtering,” Journal of Econometrics,
177(2), 213–232.
Bilson, J. F. O. (1978): “The Current Experience with Floating Exchange Rates: An

Appraisal of the Monetary Approach,” American Economic Review, 68(2), 392–397.
Carriero, A., G. Kapetanios, and M. Marcellino (2009): “Forecasting exchange

rates with a large Bayesian VAR,” International Journal of Forecasting, 25(2), 400–417.
Chan, J. C., and E. Eisenstat (2018): “Bayesian model comparison for time-varying
parameter VARs with stochastic volatility,” Journal of Applied Econometrics.
Della Corte, P., and I. Tsiakas (2012): “Statistical and economic methods for
evaluating exchange rate predictability,” Handbook of exchange rates, pp. 221–263.
Diebold, F. X., and R. S. Mariano (1995): “Comparing predictive accuracy,”

Journal of Business & Economic Statistics, pp. 253–263.
Dornbusch, R. (1976): “Expectations and Exchange Rate Dynamics,” Journal of

Political Economy, 84(6), 1161–1176.
Engel, C. (2013): “Exchange Rates and Interest Parity,” NBER Working Papers 19336,
National Bureau of Economic Research, Inc.
Geweke, J., and G. Amisano (2011): “Optimal prediction pools,” Journal of

Giacomini, R., and B. Rossi (2010): “Forecast comparisons in unstable

environments,” Journal of Applied Econometrics, 25(4), 595–620.
Giannone, D., M. Lenza, and G. E. Primiceri (2015): “Prior selection for vector
autoregressions,” The Review of Economics and Statistics, 97(2), 436–451.
36

Goetzmann, W., J. Ingersoll, M. Spiegel, and I. Welch (2007): “Portfolio
performance manipulation and manipulation-proof performance measures,” The
Review of Financial Studies, 20(5), 1503–1546.
Han, Y. (2006): “Asset allocation with a high dimensional latent factor stochastic
volatility model,” Review of Financial Studies, 19(1), 237–271.
Koop, G., and D. Korobilis (2013): “Large time-varying parameter VARs,” Journal
of Econometrics, 177(2), 185–198.
Ledoit, O., and M. Wolf (2008): “Robust performance hypothesis testing with the
Sharpe ratio,” Journal of Empirical Finance, 15(5), 850–859.
Mark, N. C., and D. Sul (2001): “Nominal exchange rates and monetary
fundamentals: evidence from a small post-Bretton Woods panel,” Journal of
International Economics, 53(1), 29–52.
Marquering, W., and M. Verbeek (2004): “The economic value of predicting stock
index returns and volatility,” Journal of Financial and Quantitative Analysis, 39(02),
407–429.
Maurer, T. A., and L. Pezzo (2018): “Importance of transaction costs for asset
allocations in FX markets,” Available at SSRN: https://ssrn.com/abstract=3143970.
Rogoff, K. (1996): “The Purchasing Power Parity Puzzle,” Journal of Economic

Literature, 34(2), 647–668.
Rossi, B. (2013): “Exchange rate predictability,” Journal of Economic Literature, 51(4),

1063–1119.
Sarno, L., and G. Valente (2009): “Exchange rates and fundamentals: Footloose or
evolving relationship?,” Journal of the European Economic Association, 7(4), 786–830.
Triantafyllopoulos, K. (2011): “Time-varying vector autoregressive models with

stochastic volatility,” Journal of Applied Statistics, 38(2), 369–382.
West, K. D., H. J. Edison, and D. Cho (1993): “A utility-based comparison of

some models of exchange rate volatility,” Journal of International Economics, 35(1-2),
23–45.
West, M., and J. Harrison (1997): Bayesian forecasting and dynamic models.
Springer, 2nd edn.
Wright, J. H. (2011): “Term premia and inflation uncertainty: Empirical evidence

from an international panel dataset,” American Economic Review, 101(4), 1514–34.
37

SSRN Id3271970

Uploaded by

Copyright:

Available Formats

SSRN Id3271970

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id3271970

Uploaded by

Copyright:

Available Formats

Exchange rate predictability and dynamic Bayesian

Dimitris Korobilis‡ Rainer Schüssler§

July 29, 2019

Keywords: Exchange rates; Bayesian vector autoregression; Forecasting; Dynamic

JEL Classification: C11; G11; G12; G15; G17; F31

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

2 Relation to the literature

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

4.1 The VAR

Our starting point is a time-varying parameter VAR with exogenous variables:

βt+1 = βt + ut , ut ∼ N (0, Ωt ) , (2)

where yt is an M × 1 vector containing observations on M time series variables (in our

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

4.2 Dynamic model learning

Electronic copy available at: https://ssrn.com/abstract=3271970

5.1 Evidence on model switching and sparsity

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

1996:01 2001:01 2006:01 2011:01 2016:01

Figure 2: Inclusion of Blocks of Variables

1996:01 2001:01 2006:01 2011:01 2016:01

5.2 Evaluation of economic utility and forecast performance

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

5.3 Market timing in high volatility periods

Electronic copy available at: https://ssrn.com/abstract=3271970

W = 2.10 −10.87 F XV OL.

d = 0.0254 + 0.0033F XV OL +0.00014 V IX −0.0039 F XDIS.

Electronic copy available at: https://ssrn.com/abstract=3271970

Ackermann, F., W. Pohl, and K. Schmedders (2016): “Optimal and naive

Andrews, D. W., and J. C. Monahan (1992): “An improved heteroskedasticity and

Electronic copy available at: https://ssrn.com/abstract=3271970

(2006): “Can information heterogeneity explain the exchange rate determination

Berge, T. J. (2014): “Forecasting disconnected exchange rates,” Journal of Applied

Billio, M., R. Casarin, F. Ravazzolo, and H. van Dijk (2013): “Time-varying

Brunnermeier, M. K., S. Nagel, and L. H. Pedersen (2008): “Carry trades and

Byrne, J. P., D. Korobilis, and P. J. Ribeiro (2016): “Exchange rate predictability in

Carriero, A., G. Kapetanios, and M. Marcellino (2009): “Forecasting exchange

Chib, S., F. Nardari, and N. Shephard (2006): “Analysis of high dimensional

Diebold, F. X., and R. S. Mariano (1995): “Comparing predictive accuracy,” Journal

Electronic copy available at: https://ssrn.com/abstract=3271970

Electronic copy available at: https://ssrn.com/abstract=3271970

Dimitris Korobilis‡ Rainer Schüssler§

July 29, 2019

Electronic copy available at: https://ssrn.com/abstract=3271970

3 Simulation experiment: model incompleteness 11

Electronic copy available at: https://ssrn.com/abstract=3271970

In this sub-section, we provide econometric details of our (TVP)-VARs. Filtered estimates

βt |y t−1 , Σt−1 ∼ N βt|t−1 , Ωt|t−1 ,

To produce Ωt|t−1 we use a discount factor approximation involving a discount factor λ

Electronic copy available at: https://ssrn.com/abstract=3271970