1 s2.0 S0378426623001711 Main

Journal of Banking and Finance 155 (2023) 106973
Contents lists available at ScienceDirect
Journal of Banking and Finance

journal homepage: www.elsevier.com/locate/jbf
The incremental information in the yield curve about future interest rate
risk ✩
Bent Jesper Christensen ∗ , Mads Markvart Kjær, Bezirgen Veliyev
CREATES, Department of Economics and Business Economics, Aarhus University, and the Danish Finance Institute, Fuglesangs Allé 4, DK-8210 Aarhus V, Denmark
A R T I C L E I N F O A B S T R A C T
JEL classification: Using high-frequency intraday futures prices to measure yield volatility at selected maturities, we find that
C58 daily yield curves carry incremental information about future interest rate risk at the long end, relative to that
E43 contained in the time series of historical volatilities. Some of the information in the yield curves is not captured
G12
by standard affine models. Our results point to the existence of an unspanned stochastic volatility factor. Both
time series and yield curve based forecasts provide utility to a risk averse investor, relative to a random walk.
Keywords:
Information from the two sources can be combined to enhance yield volatility forecasting performance.
Term structure models
Volatility
Forecasting
Kalman filtering
Yield curve
1. Introduction only exceptions are Collin-Dufresne et al. (2009) and Joslin and Kon-
chitchki (2018), who consider fixed-window estimation, and do not
Affine term structure models have been the workhorse in the inter- compare the model based forecasts against time series benchmarks be-
est rate literature for decades and can successfully explain bond prices yond a random walk (RW). In contrast, we update the parameters recur-
(see Duffee (2002), Christensen et al. (2011), and many others). How- sively according to investor’s information set, and compare the model
ever, it is questionable whether these models are able to explain yield based volatility forecasts to leading time series benchmarks, in particu-
volatilities, and hence reinvestment rate risk, which plays a crucial role lar, the heterogeneous autoregressive (HAR) model of Corsi (2009) and
in portfolio allocation, market timing, derivative pricing, and risk man- the realized GARCH model of Hansen et al. (2012).
agement. The literature has primarily focused on whether volatilities Our focus is on whether the yield curve contains incremental in-
are spanned by interest rates, as implied by standard affine models. formation about future yield volatility, beyond that available from the
Collin-Dufresne et al. (2009) and Andersen and Benzoni (2010) con- time series of historical volatilities. In addition, we examine whether
clude that they are not, while Jacobs and Karoui (2009) and Bikbov the time series based forecasters contain incremental information about
and Chernov (2009) reach mixed conclusions. future volatility, beyond the information that can be read off the yield
In this paper, we consider out-of-sample forecasting of yield volatil- curve. Further, we supplement the existing static analysis of the span-
ity. Since future reinvestment rate risk matters for pricing and trading ning issue with a dynamic out-of-sample assessment.
decisions, the yield curve should be sensitive to, and hence informative As a separate contribution, we use high-frequency intraday yield
about, future yield volatility. Nevertheless, the forecasting perspective curves in the analysis, thus adding precision to volatility measurements.
has so far received little attention in the yield volatility literature. The The past two decades have seen important advances in high-frequency
✩
We are grateful to Thorsten Beck (the editor), the anonymous referees, Peter Feldhutter, Niels S. Gronborg, Allan Timmermann, Anders B. Trolle, and participants
at the Conference on Computational and Financial Econometrics (CFE) in London, 2019, the Vienna–Copenhagen Conference on Financial Econometrics, 2022, and
the Joint Econometrics-Finance Seminar in Aarhus for useful comments, and to Center for Research in Econometric Analysis of Time Series (CREATES, funded by
the Danish National Research Foundation, DNRF78), the Independent Research Fund Denmark (grant number 2033-00137B), and the Danish Finance Institute (DFI)
for research support.
* Corresponding author.
E-mail addresses: [email protected] (B.J. Christensen), [email protected] (M.M. Kjær), [email protected] (B. Veliyev).
https://doi.org/10.1016/j.jbankfin.2023.106973
Received 26 July 2021; Received in revised form 10 June 2023; Accepted 31 July 2023
Available online 9 August 2023
0378-4266/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
data analysis, primarily focusing on equities, e.g., Barndorff-Nielsen and Jacobs and Karoui (2009) examine the ability of three-factor affine
Shephard (2002), Andersen et al. (2003), and Zhang et al. (2005). How- models to fit conditional volatilities in-sample, using estimates from
ever, intraday data have seen little application in the interest rate litera- an EGARCH model to proxy for true volatility. They reach mixed con-
ture. Notable exceptions are Faust et al. (2007), who use high-frequency clusions, depending on the sample considered. Correlations between
data to study macroeconomic announcement effects, not volatility, and model-implied volatility and the proxy range between 60% and 75%
Andersen and Benzoni (2010) and Cieslak and Povala (2016), who use in US Treasury data, but are much lower in swap data, even negative
realized volatilities of interest rates based on a 10-minute sampling fre- for some maturities. Collin-Dufresne et al. (2009) consider affine mod-
quency, chosen to mitigate the effects of market microstructure noise. els with and without a USV factor and conclude that volatility cannot
We overcome the latter issue in a more direct manner by relying on be extracted from the cross section of interest rates based on swap data.
the pre-averaged realized variance estimator of Jacod et al. (2009), a Andersen and Benzoni (2010) find low 𝑅2 s for regressing future vari-
noise-robust estimator of volatility with excellent empirical properties ances on PCA factors from the yield curve, and argue that they conduct
for various asset classes, as documented by Christensen et al. (2014). a test of whether variances are spanned by bonds. The validity of this
This choice allows us to use data at higher frequency. From US Trea- regression test of USV is questioned by Bikbov and Chernov (2009),
sury and Eurodollar futures prices, we construct yields with maturities who show that low 𝑅2 is expected if yields are observed with error,
6 months, and 1, 5, and 7 years, at the 1-minute frequency. From these, a standard assumption when estimating term structure models. Bikbov
we calculate a daily time series of realized volatility measures at each and Chernov (2009), Joslin (2017), and Joslin and Konchitchki (2018)
of the four maturities. We consider daily yield curve based forecast- consider both swaps and swaptions and reject the restrictions needed
ers of volatility over the subsequent month, constructed from either (i) for affine models to generate USV. We contribute to this literature by
affine term structure models, (ii) principal component analysis (PCA), considering the spanning question in an out-of-sample framework, and
or (iii) interest rate (yield and forward) spreads, estimated recursively providing evidence pointing to the existence of a USV factor.
in a standard daily yield panel. The alternative forecasters based on the The USV puzzle has recently led to the construction of more complex
time series of historical volatilities (RW, HAR, realized GARCH) serve models, with a focus on the ability to fit volatilities and price swaptions.
as benchmarks. Feldhütter et al. (2016) propose a nonlinear model that is able to gen-
We find that the HAR model provides a strong benchmark for yield erate features consistent with USV. Cieslak and Povala (2016) construct
volatility forecasting. It provides relatively accurate forecasts across all a model with volatility factors driven by a Wishart process. Filipović
maturities considered. At the intermediate 5 year maturity, a simple et al. (2017) present a nonlinear model featuring USV that can price
forward spread, motivated by risk premium considerations, or a com- both bonds and swaptions successfully. None of these papers considers
mon factor approach provide equally accurate forecasts, and all three out-of-sample forecasting. It is beyond the scope of the present paper to
improve significantly over the naive random walk volatility forecast. include these more complex models, as they are very time-consuming
However, we also find that the yield curve contains incremental in- to implement and reestimate recursively.
formation about future yield volatility relative to the time series of The paper is organized as follows. Section 2 describes the forecast-
historical volatilities, including HAR, in terms of forecasting accuracy ing methods, including the term structure models, PCA and interest
at the long end of the curve. Next, we carry out a specification test as in rate spread based forecasts, and time series models. Section 3 describes
Cieslak and Povala (2016) and find that none of the affine or time series the high-frequency futures data and the construction of realized yield
models considered subsumes all the relevant information about future
volatility measures. Section 4 presents the estimation method for the
volatility contained in the yield curve across all maturities. This indi-
term structure models based on the Kalman filter. Section 5 contains
cates that the yield curve contains some important information about
the empirical analysis, and Section 6 the robustness analysis. Section 7
future reinvestment rate risk that neither the term structure models nor
concludes. The Appendix contains additional material on data, models,
the volatility history capture. Extending this idea, we develop a test of
estimation, and results.
whether past volatility forecast errors are informative about future fore-
cast errors, as would be the case under unspanned stochastic volatility
(USV) in the sense of Collin-Dufresne and Goldstein (2002). We per- 2. Interest rate risk forecasters
form the test by extracting a common factor from the volatility forecast
errors across maturities and find that it provides significant informa- Throughout, 𝑦𝜏𝑡 denotes the maturity 𝜏 yield at time 𝑡. In our em-
tion about future forecast errors, thus indicating the presence of USV pirical out-of-sample analysis, we consider daily forecasting of month-
features in the data. ahead interest rate risk by maturity. To assess the forecasts, we use the
We investigate the economic value of the forecasters to an investor high-frequency data to calculate daily realized yield volatility, labeled
in a portfolio allocation exercise, using a utility-based framework as 𝑉𝑡𝜏 (see Section 3.4 – we consider the pre-averaged realized variance
in Bollerslev et al. (2018). From the results, the information about fu- and henceforth use the terms realized volatility and realized variance
ture interest rate risk contained in either the yield curve or historical interchangeably for this). In line with the volatility literature (e.g., Pat-
volatility provides economic value to a risk averse investor. A robust- ton and Sheppard (2015) and Bollerslev et al. (2016)), the target for
ness analysis shows that higher-order PCA factors beyond the first three the forecast built at 𝑡, based on investor’s information set, is not the re-
usually explaining yields in fact carry information about volatility. The alized volatility at 𝑡 + ℎ, but instead the aggregated realized measure
∑
portion of future volatility information in the yield curve that is not 𝜏
𝑉𝑡+1→𝑡+ℎ = ℎ𝑖=1 𝑉𝑡+𝑖
𝜏 , a proxy for integrated volatility from 𝑡 through
captured by standard affine term structure models is contained in these 𝜏

𝑡 + ℎ, or 𝐼𝑉𝑡→𝑡+ℎ , say.1 All forecasts considered are recursive, i.e., only
higher-order factors rather than in nonlinearities. A comparison with 𝜏
data through 𝑡 are used to construct the forecast of 𝑉𝑡+1→𝑡+ℎ , for each
forecasts of future interest rate risk extracted from a wide cross sec- maturity 𝜏 and method considered, with ℎ set to 22 trading days for
tion of coupon bonds confirms that the yield curve based forecasts are month-ahead forecasting.2 Methods utilizing the cross-sectional infor-
well represented by the daily panel estimates. Finally, we demonstrate mation in the yield curve are presented first, followed by methods
that information from the yield curve and the volatility history can be relying exclusively on the information in the time series of historical
fruitfully combined to enhance yield volatility forecasting performance. volatilities.
Our work on out-of-sample interest rate risk complements several
strands of literature. Within the class of affine term structure models,
𝜏
conditional variances are affine in state variables. In standard versions, 1
If the yield follows the Itô process 𝑑𝑦𝜏𝑡 = 𝜇𝑡𝜏 𝑑𝑡 + 𝜎𝑡𝜏 𝑑𝑊𝑡 , then 𝐼𝑉𝑡→𝑡+ℎ =
𝑡+ℎ ( 𝜏 )2
volatilities can be extracted from the cross section of interest rates. Un- ∫𝑡 𝜎𝑠 𝑑𝑠.
der USV, some factors affect volatility, but not interest rates directly. 2
Results for ℎ = 44, forecasting two months ahead, are in the Appendix.
2
Table 1
Affine models.
Model Reference 𝐴𝑚 (𝑑)-notation Market price of risk
Vasicek Vasicek (1977) 𝐴0 (1) 𝜆

√
CIR Cox et al. (1985) 𝐴1 (1) 𝜆∕𝜎 𝑟𝑡
√
AFNS0 Christensen et al. (2011) 𝐴0 (3) ̃ 𝜃̃ − 𝑋𝑡 ))
(Σ 𝑆(𝑋𝑡 ))−1 (𝜅(𝜃 − 𝑋𝑡 ) − 𝜅(
√
AFNS3 Christensen et al. (2010) 𝐴3 (3) ̃ 𝜃̃ − 𝑋𝑡 ))
(Σ 𝑆(𝑋𝑡 ))−1 (𝜅(𝜃 − 𝑋𝑡 ) − 𝜅(
This table presents the reference, notation of Dai and Singleton (2000), and market price of
risk specification for the affine term structure models considered.
2.1. Yield curve based volatility forecasters a scalar. In addition, 𝑆(𝑋𝑡 ) is a diagonal 𝑑 × 𝑑 matrix with each element
affine in 𝑋𝑡 ,
We consider an 𝑁 × 𝑇 fixed-maturity panel, i.e., the yields at 𝑡 are
𝜏 𝜏
𝑦𝑡 = (𝑦𝑡 1 , … , 𝑦𝑡 𝑁 )′ , 𝑡 = 1, … , 𝑇 , and yields of all maturities can be used to [𝑆(𝑋𝑡 )]𝑖𝑖 = 𝛼𝑖 + 𝛽𝑖′ 𝑋𝑡 , (3)
generate volatility forecasts at a given maturity 𝜏. For each candidate
forecasting method, recursive volatility forecasts are constructed using with 𝛼𝑖 a scalar, and 𝛽𝑖 a 𝑑 × 1 vector. Given a suitable market price of
the regression risk specification 𝜆𝑡 (see Appendix C), the dynamics of 𝑋𝑡 are governed
by an affine diffusion under the risk-neutral measure ℚ, too,
𝜏
𝑉𝑡+1→𝑡+ℎ = 𝜌𝜏,ℎ + 𝜌𝜏,ℎ′ 𝑍𝑡 + 𝑢𝜏,ℎ , (1)
0 1 𝑡+ℎ √
̃ 𝜃̃ − 𝑋𝑡 )𝑑𝑡 + Σ 𝑆(𝑋𝑡 )𝑑𝑊𝑡ℚ .
𝑑𝑋𝑡 = 𝜅( (4)
for fixed 𝜏, ℎ, with 𝜌𝜏,ℎ
1
a 𝑞 × 1 vector of predictive coefficients, 𝜌𝜏,ℎ
0
the scalar intercept, and 𝑍𝑡 the relevant information variable extracted Following Duffie and Kan (1996), the yields are given by
from the yield panel and conditioning the forecast as of 𝑡 for the given
method, e.g., a 𝑞-vector of fitted PCA factors, or a conditional volatility 𝐴(𝜏) 𝐵(𝜏)′
𝑦𝜏𝑡 = + 𝑋𝑡 , (5)
forecast from an affine model. When constructing the forecast at 𝑡′ of 𝜏 𝜏
𝑉𝑡𝜏′ +1→𝑡′ +ℎ , only data through 𝑡′ are used to extract 𝑍1 , … , 𝑍𝑡′ from the where 𝐴, 𝐵 are solutions to the system of ordinary differential equations
yield panel, then estimate Eq. (1) over 𝑡 = 1, … , 𝑡′ − ℎ, and the forecast
𝑑
is 𝜌̂𝜏,ℎ + 𝜌̂𝜏,ℎ′ 𝑍𝑡′ . We consider methods for extracting 𝑍𝑡 based on affine 𝑑𝐴(𝜏) ̃ ′ ′ 1 ∑[ ′ ]2
0 1 = 𝜃 𝜅̃ 𝐵(𝜏) − Σ 𝐵(𝜏) 𝑖 𝛼𝑖 + 𝛿0 , 𝐴(0) = 0 ,
term structure models, PCA, common factors, and risk premiums. 𝑑𝜏 2 𝑖=1
The recursive regressions in Eq. (1) are used in the construction of (6)
𝑑
volatility forecasts, for two reasons. First, investor is not basing the 𝑑𝐵(𝜏) 1 ∑[ ′ ]2
= −𝜅̃ ′ 𝐵(𝜏) − Σ 𝐵(𝜏) 𝑖 𝛽𝑖 + 𝛿1 , 𝐵(0) = 0 ,
forecast of 𝑉𝑡𝜏′ +1→𝑡′ +ℎ exclusively on 𝑍𝑡′ from the daily yields, hav- 𝑑𝜏 2 𝑖=1
ing observed the history of realized volatilities through 𝑡′ . Second, in
which can be solved either analytically or numerically.
the implementation, volatility forecasts based on 𝑍𝑡 include estima-
By Eq. (5), the ℎ periods ahead conditional variance of the maturity
tion error, whereas realized yield variance used as forecasting target
𝜏 yield, given information through 𝑡, is
in the assessment includes measurement error, thus leading to poten-
tial biases. Therefore, the Mincer and Zarnowitz (1969) type predictive 1
regressions in Eq. (1) are used to leverage any indication in investor’s
𝜏
𝑉𝑡,ℎ = 𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) = 𝐵(𝜏)′ 𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ )𝐵(𝜏) = 𝑏𝜏,ℎ + 𝑏𝜏,ℎ′ 𝑋𝑡 , (7)
𝜏2 0 1
information set that the centering or scale of 𝑍𝑡 from the given method
i.e., affine in the latent state variables. The precise forms of 𝑏𝜏,ℎ and 𝑏𝜏,ℎ
do not line up well with subsequent realized volatilities. The main ques- 0 1
are given in Appendix D.3. We consider affine models with 𝑑, the num-
tion is whether 𝑍𝑡 serves as a useful information variable, i.e., whether
variation in this provides incremental predictive power. If, for exam- ber of factors, from 1 to 3. Table 1 summarizes the models considered,
ple, a combination of the most recent realized volatility observations including reference, 𝐴𝑚 (𝑑) classification as in Dai and Singleton (2000),
𝑉𝑡𝜏′ , 𝑉𝑡𝜏′ −1 , … provides the best forecast at 𝑡′ , as in the HAR model, then 𝑚 being the number of factors conditioning variances, and market price
this feature will not be fully captured by the intercept and slope in of risk specification. In Appendix C, we provide a formal description of
Eq. (1), estimated over the full window 𝑡 = 1, 2, … , 𝑡′ − ℎ. It should be each model. We use Eq. (7) as the basis of our first yield curve based
possible to improve on the resulting forecast using pure time series forecasters. Given the diversity of models considered, we are able to
methods and historical volatilities, hence revealing that 𝑍𝑡 from the examine the impact of both the number of latent factors and stochas-
panel does not carry incremental information. Thus, the regression ap- tic (state dependent) volatility on the forecasting of future interest rate
proach secures a level playing field for comparison of yield curve and risk.
time series based yield volatility forecasts. Implementation of Eq. (7) requires values for 𝑏𝜏,ℎ 0
, 𝑏𝜏,ℎ
1
, and 𝑋𝑡 . We
In our empirical work, we use 𝑁 = 8 maturities to extract 𝑍𝑡 from estimate the models recursively by quasi maximum likelihood (QML)
daily data, and volatility forecasts based on investor’s information set using the Kalman filter and an expanding estimation window, allow-
are constructed using the recursive regressions in Eq. (1) for each of the ing for additive measurement errors in yields (see Section 4 for details).
four maturities 𝜏 for which we have high-frequency data. To construct the forecast at 𝑡′ of subsequent realized yield volatility
𝑉𝑡𝜏′ +1→𝑡′ +ℎ , only information through 𝑠 is used to generate 𝑏𝜏,ℎ 0
, 𝑏𝜏,ℎ
1
, and
2.1.1. Affine term structure models 𝑋1 , … , 𝑋𝑠 , and hence the conditional volatility (or variance) estimate
In the affine class, the short rate 𝑟𝑡 = 𝑦0𝑡 is driven by some 𝑑- 𝑉̂𝑠,ℎ
𝜏 from Eq. (7), for the given 𝜏 and ℎ. This procedure is repeated for
dimensional vector of state variables, 𝑋𝑡 , such that 𝑟𝑡 = 𝛿0 + 𝛿1′ 𝑋𝑡 , and 𝑠 = 1, … , 𝑡′ . Next, Eq. (1) is estimated with 𝑍𝑡 = 𝑉̂ 𝜏 , 𝑡 = 1, … , 𝑡′ − ℎ, i.e.,
𝑡,ℎ
the dynamics of 𝑋𝑡 are 𝑞 = 1 in this case, and the final forecast of 𝑉𝑡𝜏′ +1→𝑡′ +ℎ is 𝜌̂𝜏,ℎ + 𝜌̂𝜏,ℎ′ 𝑉̂𝑡𝜏′ ,ℎ .
0 1
√ Thus, even with correct model specification, the conditional variance
𝑑𝑋𝑡 = 𝜅(𝜃 − 𝑋𝑡 )𝑑𝑡 + Σ 𝑆(𝑋𝑡 )𝑑𝑊𝑡 , (2)
itself is not an unbiased forecast of subsequent realized yield volatility.
where 𝑊𝑡 is a 𝑑-dimensional Brownian motion under the physical mea- By the law of total variance, it includes the variance of the conditional
sure ℙ, 𝜅 and Σ are 𝑑 × 𝑑 matrices, 𝛿1 and 𝜃 are 𝑑 × 1 vectors, and 𝛿0 is mean yield, given both information through 𝑡′ and integrated volatility
3
through 𝑡′ + ℎ, thus inducing an upward bias.3 The recursive regressions 𝜏 yield itself and the short rate, which we proxy by the CRSP 1-month
1∕12
in Eq. (1) provide a simple means of converting conditional variances T-bill rate, denoted 𝑦𝑡 . Thus, volatility forecasts are constructed from
Eq. (1), with 𝑍𝑡 = 𝑦𝜏𝑡 − 𝑦𝑡 , i.e., 𝑞 = 1.
into proper forecasts of future interest rate risk based on investor’s in- 1∕12
formation set. The final risk premium based forecaster is based on the forward
spread of Fama and Bliss (1987). We use the difference between the
( )
2.1.2. PCA based forecasters maturity 𝜏 forward rate, 𝑓𝑡𝜏 = 𝑦𝜏𝑡 + (𝜏 − ℎ) 𝑦𝜏𝑡 − 𝑦𝜏−ℎ
𝑡 ∕ℎ, and the short
To examine whether or not the yield curve contains relevant infor- rate as explanatory variable. Thus, maturity 𝜏 yield volatility forecasts
are constructed from Eq. (1), with 𝑍𝑡 = 𝑓𝑡𝜏 − 𝑦𝑡 .
mation about future volatility and, if it does, whether the term structure 1∕12
models capture this information, we construct volatility forecasts using

PCA factors based on the yield curve. We assume that the 𝑁 × 𝑇 yield 2.2. Time series based yield volatility forecasters
panel is generated by a factor model,
As benchmarks, we consider three simple time series models,
𝑦𝑡 = 𝜋0 + 𝜋1′ 𝐹𝑡 + 𝜀𝑡 , (8)
namely, the random walk (RW), the HAR model of Corsi (2009), and
where 𝐹𝑡 is a 𝑘-vector of common covariance-generating factors, 𝑘 < 𝑁 , the realized GARCH model of Hansen et al. (2012). All three are widely
𝜋1 is a 𝑘 × 𝑁 matrix of factor loadings, and 𝜋0 represents de-meaning. used for volatility forecasting in equity markets. The RW forecast of
We estimate 𝐹𝑡 by PCA using the methodology of Bai and Ng (2002). 𝜏
𝑉𝑡+1→𝑡+ℎ 𝜏
is simply 𝑉𝑡−ℎ+1→𝑡 . The realized GARCH specification is cho-
There is some consensus in the literature that three factors can ex- sen because it utilizes the information in the high-frequency data in
plain the variation in the yield curve (e.g., Litterman and Scheinkman forecasting, similarly to RW and HAR.
(1991)). However, whether more factors can improve volatility fore-
casts, and hence whether different factors are important for explaining 2.2.1. The HAR model
yields and for forecasting volatilities are open questions. The HAR model is given by
Recursive volatility forecasts are built using the fitted PCA factors
𝐹̂𝑡 from Eq. (8) for 𝑍𝑡 in Eq. (1), for fixed 𝜏, ℎ, i.e., 𝑞 = 𝑘 in this case. In 𝜏
𝑉𝑡+1→𝑡+ℎ = 𝛽0𝜏,ℎ + 𝛽𝐷
𝜏,ℎ 𝜏 𝜏,ℎ 𝜏
𝑉𝑡 + 𝛽𝑊 𝜏,ℎ 𝜏
𝑉𝑡−4→𝑡 + 𝛽𝑀 𝑉𝑡−21→𝑡 + 𝑢𝜏,ℎ . (10)
𝑡+ℎ
the empirical analysis, we use 𝑘 = 3 to 6 factors.
The motivation for the cascade specification is the hypothesis on hetero-
2.1.3. A common factor approach geneous beliefs, according to which different investors react to informa-
For our third yield curve based forecaster, we consider a common tion from the past day, week (5 trading days), and month (22 days). The
factor approach, using volatility information across maturities to com- model is estimated using nonoverlapping dependent variables, i.e., esti-
bine the maturity-specific forecasts from Section 2.1.2. The approach mation given information through 𝑡′ uses 𝑡 = 𝑡′ − ℎ, 𝑡′ − 2ℎ, … in Eq. (10).
is similar to that used for forecasting bond risk premiums in Cochrane Upon estimation, the forecast as of 𝑡′ is given directly by Eq. (10), for
and Piazzesi (2005), based on forward rates, and in Ludvigson and Ng 𝑡 = 𝑡′ , and 𝑢𝜏,ℎ
𝑡′ +ℎ
= 0.
(2009), based on macro variables. Here, we construct a common fac-
tor for interest rate risk forecasting based on information from the yield 2.2.2. Mean-reverting realized GARCH
curve. Our final time series benchmark is the realized GARCH model. To
Let 𝐹̃𝑡 denote some combination of fitted PCA factors from Eq. (8). accommodate mean reversion in yields, we consider a slightly extended
We regress subsequent variance averaged across maturities on 𝐹̃𝑡 , model specification, given by
√
1 ∑ 𝜏̃𝑖
4
𝑦𝜏𝑡 − 𝑦𝜏𝑡−ℎ = 𝑎𝜏 + 𝑏𝜏 𝑦𝜏𝑡−ℎ + ℎ𝜏𝑡 𝑧𝜏𝑡 , (11)
𝑉 = 𝛾0ℎ + 𝛾1ℎ′ 𝐹̃𝑡 + 𝑤ℎ𝑡+ℎ , (9)
4 𝑖=1 𝑡+1→𝑡+ℎ 𝑝 𝑞
∑ ∑
ℎ𝜏𝑡 = 𝑐 𝜏 + 𝑑𝑖𝜏 ℎ𝜏𝑡−𝑖ℎ + 𝑔𝑗𝜏 𝑉𝑡−(𝑗+1)ℎ+1→𝑡−𝑗ℎ
𝜏
, (12)
for fixed ℎ, where 𝜏̃1 , … , 𝜏̃4 are the four maturities for which we have
𝑖=1 𝑗=1
high-frequency data. Recursive volatility forecasts are constructed us-
𝜏
ing Eq. (1), for each maturity 𝜏 = 𝜏̃𝑖 , separately, with the fitted values 𝑉̃𝑡 𝑉𝑡−ℎ+1→𝑡 = 𝜉 𝜏 + 𝜙𝜏 ℎ𝜏𝑡 + 𝜗𝜏1 𝑧𝜏𝑡 + 𝜗𝜏2 ((𝑧𝜏𝑡 )2 − 1) + 𝑢𝜏𝑡 , (13)
from Eq. (9) for 𝑍𝑡 , i.e., 𝑞 = 1 in this case. In 𝐹̃𝑡 , we allow for selection where ℎ𝜏𝑡 is the conditional variance of 𝑦𝜏𝑡 , given information through
from among the first six PCA factors from Eq. (8), and in robustness 𝑡 − ℎ, 𝑧𝜏𝑡 ∼ N(0, 1), 𝑢𝜏𝑡 ∼ N(0, 𝜎𝜏,𝑢
2 ), and suppressing the dependence of pa-
checks we allow squares, cubes, and interactions in the first three of rameters on ℎ. The standard model for equities, with returns (log-price
these (see Section 6.1). Following Ludvigson and Ng (2009), the opti- changes) rather than yield changes on the left-hand side of Eq. (11), cor-
mal combination of terms to include in 𝐹̃𝑡 is selected by minimizing the responds to the special case 𝑏𝜏 = 0, and the extended model accommo-
Bayesian information criterion (BIC) for Eq. (9). We consider both selec- dates mean reversion in yields for 𝑏𝜏 ∈ (−1, 0) (at a rate corresponding
tion based on the initial estimation period, only, and recursive updating
to 𝜅 = − log(1 + 𝑏𝜏 )∕ℎ in continuous time, see Appendix D). For pur-
of the selection every period.
poses of forecasting volatility over the ℎ = 22 day horizon, we consider
a monthly specification, i.e., given information through 𝑡′ , the model is
2.1.4. Risk premium based forecasters
fit to yields and realized measures at 𝑡 = 𝑡′ , 𝑡′ − ℎ, 𝑡′ − 2ℎ, …, in analogy
Motivated by the literature on forecasting risk premiums in the bond
with the HAR model.
market, we use interest rate spreads to form two additional yield curve
The model is estimated by maximum likelihood. We adopt the spec-
based volatility forecasters. The first is the yield spread of Campbell
ification 𝑝 = 1, 𝑞 = 2 from Hansen et al. (2012). Upon estimation, the
and Shiller (1991). In this case, the explanatory variable for the ma-
forecast as of 𝑡′ is 𝜉̂𝜏 + 𝜙̂ 𝜏 ℎ𝜏𝑡′ +ℎ . For comparison, we applied the stan-
turity 𝜏 yield volatility is simply the difference between the maturity
dard model with either yields or yield changes on the left-hand side of
Eq. (11), i.e., imposing 𝑏𝜏 = −1 and 0, respectively. From the results, the
3
Write F𝑡 for the information set. We have 𝑉 𝑎𝑟(𝑦𝜏𝑡+ℎ ∣ F𝑡 ) = 𝔼(𝑉 𝑎𝑟(𝑦𝜏𝑡+ℎ ∣
extended mean-reverting realized GARCH model performs best for pur-
𝜏
F𝑡 , 𝐼𝑉𝑡→𝑡+ℎ ) ∣ F𝑡 ) + 𝑉 𝑎𝑟(𝔼(𝑦𝜏𝑡+ℎ ∣ F𝑡 , 𝐼𝑉𝑡→𝑡+ℎ
𝜏
) ∣ F𝑡 ), where 𝑉 𝑎𝑟(𝑦𝜏𝑡+ℎ ∣ F𝑡 , 𝐼𝑉𝑡→𝑡+ℎ
𝜏
)= poses of yield volatility forecasting, so we focus on this specification in
𝜏
𝐼𝑉𝑡→𝑡+ℎ . Hence, 𝑉𝑡,ℎ 𝜏
= 𝑉 𝑎𝑟(𝑦𝜏𝑡+ℎ ∣ F𝑡 ) equals 𝔼𝑡 (𝐼𝑉𝑡→𝑡+ℎ
𝜏 𝜏
) ≈ 𝔼𝑡 (𝑉𝑡+1→𝑡+ℎ ) (as real- our empirical work.4
ized volatility converges rapidly to integrated volatility in high-frequency data)
plus a positive bias stemming from conditional mean variation, 𝑉 𝑎𝑟(𝔼(𝑦𝜏𝑡+ℎ ∣
𝜏
F𝑡 , 𝐼𝑉𝑡→𝑡+ℎ ) ∣ F𝑡 ). The latter is negligible for short horizons ℎ (cf. Andersen et al. 4 Results for the standard realized GARCH model are available from the au-
(2006)), but potentially important for our month-ahead forecasts. thors on request.
4
3. Data description and construction of realized measure Finally, Eurodollar-implied government yields are converted into
zero-coupon bond prices, so that units match with the coupon bonds.
3.1. High-frequency data approach For each of the five contracts (three Treasury and two Eurodollar
futures), we construct minute bars containing the first observation in
Our high-frequency data set is based on the three Treasury futures each minute. If multiple observations on the same contract have the
(5 years, 10 years, and long term) for the long end of the yield curve, same time stamp, we use the median price. We apply a sanity check,
and 3-month Eurodollar futures for the short end. The Treasury futures discarding observations at a distance more than ten times the absolute
are traded on the Chicago Board of Trade (CBOT), and the Eurodollar mean from the median of the previous ten observations. Finally, we ex-
futures on the Chicago Mercantile Exchange (CME). Data are obtained clude days without any observation for a whole hour for at least one of
from CME Group. For liquidity reasons, we start our sample on Jan- the five futures contracts. This eliminates 96 trading days, leaving 5,019
uary 2, 2000, and data run through December 31, 2020. Due to the observation days in the final analysis data set. Appendix B describes the
opening hours in the early part of our sample, we consider a trading liquidity of the Treasury contracts.
day from 7:20 am to 2:00 pm Eastern Time. All weekends and non-
working days are excluded. This leaves a raw data set covering 5,115
3.3. Construction of intraday yield curves from bond prices
days.
The Treasury futures have delivery dates in four different months
In the second step, extracting high-frequency yield curves from the
during the year—March, June, September, and December. For each un-
futures-implied bond prices using cubic splines, an implication of our
derlying bond maturity, we include the futures contract with highest
approach is that the maturities of bonds considered vary over the sam-
liquidity, in most cases that with shortest term to delivery. For the Eu-
rodollar futures, we include the two contracts with closest to 3 and ple period, and therefore knot points vary, too. Due to the small number
9 months to delivery, and convert the prices of these into yields of 6 of cross-sectional observations (five futures contracts), we choose not to
months and 1 year to maturity. Section 3.2 describes the details of the interpolate the calibrated curves between points of principal payments.
procedure. The outcome of the procedure is a complete set of consecutive yield
We extract high-frequency yield curves from the five futures prices curves at the 1-minute frequency. More details on the second step are
in a two-step approach. In the first step, futures prices are converted provided in Appendix A.2.
into coupon bond prices. In the second, yield curves are extracted from Yields are read off the calibrated curves at maturities 𝜏 reflecting
the coupon bond prices. Our approach differs from that of Faust et the underlying maturities of the futures contracts, to ensure a sufficient
al. (2007) mainly in two respects. First, we include more information amount of market based information around the relevant points along
on the long end of the curve, using all three Treasury futures, rather the curves. Initially, we consider 𝜏 = 0.5, 1, 5, 7, and 15 years. The
than only 10-year contracts. Second, following, e.g., Cieslak and Po- match to the underlying maturities of the corresponding futures is by
vala (2016), we calibrate yield curves using the spline method of Fisher construction for the first two, and holds to a reasonable degree for the
et al. (1995), instead of the Nelson and Siegel (1987) method.5 In the third, while 𝜏 = 7 is included as the closest among standard maturities
following, we briefly outline the approach. to the approximately 7.5 year average maturity of the Treasury note un-
derlying the 10-year contract. The underlying of the long term Treasury
3.2. Construction of intraday bond prices from futures data futures is a hypothetical 6% bond. At expiration, a bond with maturity
between 15 and 25 years must be delivered. When the interest rate is
Both Treasury and Eurodollar futures are considered in the first step. below 6%, conversion rates favor bonds with short maturities, i.e., the
The underlying bond of a Treasury futures contract is hypothetical, and underlying bond is actually of maturity 15 years for our out-of-sample
unknown until delivery. The seller chooses a bond from a delivery bas- window, and hence 𝜏 = 15 is considered as a candidate fifth maturity for
ket consisting of bonds meeting requirements from CME, implying that our construction. To verify the quality of the resulting high-frequency
a Treasury futures involves two options: When to deliver, and which yields at the five candidate 𝜏-values, we match a daily frequency sub-
bond to deliver. We neglect the value of the first option by assuming sample of them with daily Gürkaynak et al. (2007) yields. For the latter,
that the delivery date is the first working day of the delivery month. the number of bonds with maturity around 15 years outstanding at any
Next, given the delivery date, we construct the delivery basket by com- given point in time in the CRSP data is rather limited, hence imply-
bining the futures data with daily CRSP data on Treasuries, find the ing a low weight around this maturity in the calibration, and we find a
cheapest-to-deliver (CTD) bond in the basket thus constructed, and cal- relatively low correlation between the resulting yields and the daily sub-
culate the futures-implied coupon bond price.6 More details on this step sample of ours at 𝜏 = 15. Thus, as we cannot verify the quality of our fit
are provided in Appendix A.1. at the fifth maturity, we henceforth restrict attention to the yields read
The value of the Eurodollar futures at maturity is determined by off our high-frequency curves at 𝜏 = 0.5, 1, 5, and 7 years, although
the 3-month LIBOR rate. From the observed futures prices, we derive we continue to use all five futures contracts in the construction of the
high-frequency LIBOR rates with maturity date 3 months after the de- curves.
livery date. LIBOR rates are afflicted with credit risk, and we deduce At the four maturities retained, correlations between the end-of-day
Eurodollar-implied government yields by assuming a constant credit subset of futures-implied yields extracted from our 1-minute data and
spread through the day between LIBOR and government yields. Daily the daily calibrated yields range between 99.37% and 99.99% (see Ta-
LIBOR rates with maturities of 6 months and 1 year are obtained from ble A.1 in the Appendix). In effect, we have created a high-frequency
the St. Louis Fed and compared with daily Gürkaynak et al. (2007) intraday version of the Gürkaynak et al. (2007) dataset up to maturity
yields (see Section 3.5 for the construction) to estimate the spread. 𝜏 = 7. Table 2 shows descriptive statistics on our yield data, by matu-
rity. On average over the sample period, the term structure of interest
rates is mainly upward sloping, and the term structure of volatilities
5 A related cubic spline method due to Waggoner (1997) is adopted by Dai et
downward sloping, although both are relatively flat at the short end.
al. (2007) and Andersen and Benzoni (2010).
6 Treasuries are non-convertible, and starting in 2000 avoids problems with Skewness and kurtosis are modest, especially at longer maturities. Fig. 1
displays the evolution through time in yields, by maturity, along with
the flower bonds issued until 1965, as these all matured by 1998. Callable bonds
and notes were issued until 1985, but many of these subsequently repurchased a three-dimensional view of the evolution of the high-frequency yield
by the Treasury and reissued as non-callable, although on a discretionary basis, curves through calendar time. The transition to the zero lower bound
without sinking fund provision. We drop the small number of remaining callable (ZLB) regime around 2008 has a strong impact, especially at the shorter
issues outstanding as of January 2, 2000. maturities.
5
Fig. 1. High-frequency yields.

This figure shows the time series of high-frequency annualized yields in percent, by maturity, along with a three-dimensional view of the evolution through time of
the yield curves. The sample spans the period from January 2, 2000, through December 31, 2020.
Table 2 proach of Barndorff-Nielsen et al. (2008), and the pre-averaged realized

Descriptive statistics on high-frequency yields. variance of Jacod et al. (2009). We deal with the issue by implement-
ing the pre-averaging estimator following Christensen et al. (2014), who
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
show that this is successful in empirical analysis of volatility.
Mean 1.63 1.57 2.82 3.25 For each maturity 𝜏, and on each day, write the intraday yield data
Std. 1.86 1.88 1.59 1.35 as 𝑦𝜏𝑖 , for 𝑖 = 0, 1, … , 𝑛. Pre-averaged yield changes are calculated by
𝑛
Skewness 1.06 1.01 0.60 0.26
averaging yields in a local neighborhood consisting of 𝐾 observations,
Kurtosis 2.94 2.84 2.50 2.19
( 𝐾−1 𝐾∕2−1
)
This table presents the mean, standard deviation, 1 ∑ ∑
Δ𝑦̄𝜏𝑖 = 𝜏
𝑦 𝑖+𝑗 − 𝜏
𝑦 𝑖+𝑗 , (14)
skewness, and kurtosis for the end-of-day subset of 𝐾 𝑗=𝐾∕2 𝑛 𝑗=0 𝑛
our high-frequency annualized yields in percent,
√
by maturity. The sample spans the period from with 𝐾 the nearest even integer to 𝜃 𝑛, and 𝜃 a tuning parameter. The
January 2, 2000, through December 31, 2020. pre-averaged realized variance is then calculated as
𝑛−𝐾+1
∑ (
3.4. Realized volatility measure 𝑛 1 )2 𝜔̂ 2
𝑉𝑡𝜏 = Δ𝑦̄𝜏𝑖 − , (15)
𝑛 − 𝐾 + 2 𝐾𝜓𝐾 𝑖=0 𝜃𝜓𝐾
Access to high-frequency data on intraday yield curves enables us
with 𝜓𝐾 = (1 + 2𝐾 −2 )∕12, and 𝜔̂ 2 an estimate of the noise variance,
to estimate volatilities using realized measures. The market for the
Treasury futures used in the construction of yield curves (Section 3.2) 𝑛 ( )( )
1 ∑ 𝜏
has undergone dramatic changes in liquidity during our sample pe- 𝜔̂ 2 = − 𝑦 𝑖 − 𝑦𝜏𝑖−1 𝑦𝜏𝑖−1 − 𝑦𝜏𝑖−2 . (16)
𝑛 − 1 𝑖=2 𝑛 𝑛 𝑛 𝑛
riod, in part due to the introduction of the electronic trading pit in
2004, and data are likely afflicted with market microstructure noise Given our one-minute sampling, the number of observations in a day,
(see Appendix B). The high frequency econometrics literature proposes 𝑛, is 380, and we set 𝜃 = 1. Since we are interested in volatility over the
estimators of volatility that are robust to microstructure noise, e.g., the entire day, not only the hours for which we have high-frequency data,
two time scales estimator of Zhang et al. (2005), the realized kernel ap- we add the squared overnight difference in yields, following Andersen
6
Table 3 (volatility spikes are included in 22 measures rather than one). As ex-
Descriptive statistics on pre-averaged realized yield pected, other moments are smallest for the aggregated measures.
volatility. Fig. 2 displays the evolution through time in daily annualized pre-
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
averaged realized one-month yield volatility in percent, by maturity,
√
corresponding to Table 3, Panel B, along with a three-dimensional view
Panel A: One-day volatility, 𝑉𝑡𝜏 of the volatility surface.
Mean 0.89 0.79 1.28 1.22
Volatilities rise quite dramatically for the shortest maturities during
the financial crisis and the transition to the ZLB regime, and drop to a
Std. 1.05 1.01 1.86 1.51
low level after the transition, only to rise again around 2016.
Skewness 4.33 4.79 2.13 1.78
Kurtosis 45.48 53.80 7.19 6.63
3.5. Daily yield data
√
𝜏
Panel B: Month-ahead volatility, 𝑉𝑡+1∣𝑡+22
To enable estimation of the affine term structure models, as well
Mean 1.16 1.01 2.14 1.87
as the PCA, common factor, and risk premium based forecasters, on a
Std. 0.81 0.84 0.92 0.69 daily yield panel with more than four observations in the cross section,8
Skewness 2.41 2.04 -0.09 0.60 we consider the estimated parameters provided by Gürkaynak et al.
Kurtosis 11.57 9.08 2.54 3.68 (2007). Their daily frequency dataset is extracted from a large set of
Panel C: One-day variance, 𝑉𝑡𝜏
coupon bonds, using the Svensson (1994) method. The continuously
compounded yield at maturity 𝜏 is written as
Mean 1.88 1.64 5.12 3.75
− 𝜃𝜏 ( − 𝜃𝜏 ) ( − 𝜃𝜏 )
Std. 8.62 8.48 12.98 8.50 1−𝑒 1 1−𝑒 1 − 𝜃𝜏 1−𝑒 2 − 𝜃𝜏
𝑦𝜏𝑡 = 𝛽0 +𝛽1 𝜏 +𝛽2 𝜏 −𝑒 1 +𝛽3 𝜏 −𝑒 2 , (17)
Skewness 31.07 33.72 3.84 6.17
𝜃1 𝜃1 𝜃2
Kurtosis 1462.90 1663.60 21.09 101.05
and estimates of (𝛽0 , 𝛽1 , 𝛽2 , 𝛽3 , 𝜃1 , 𝜃2 ) are provided at daily frequency.9
𝜏
Panel D: Month-ahead variance, 𝑉𝑡+1∣𝑡+22 We follow Christensen et al. (2010) and consider 𝑁 = 8 maturities in
Mean 1.99 1.73 5.42 3.96 the daily cross sections, namely, 3 and 6 months, and 1, 2, 3, 5, 7, and
Std. 3.74 3.43 4.00 2.97
10 years.
Skewness 5.49 5.24 0.89 1.83
4. Estimation
Kurtosis 41.24 39.13 3.58 7.98
This table presents the mean, standard deviation, The affine term structure models from Section 2.1.1 are estimated
skewness, and kurtosis of the daily annualized pre- on daily data, cf. Section 3.5, using the Kalman filter. The basic mea-
averaged realized yield volatilities, by maturity. Statis- surement and transition equations are obtained by allowing for mea-
tics for the square root (volatility) form are shown surement error 𝜀𝜏𝑡+ℎ in the yields in Eq. (5), and discretizing the state
for one-day measures in Panel A, and for one-month
dynamics in Eq. (2), i.e., the state space model is given by
(ℎ = 22) measures in Panel B, both in percent. Statis-
tics for the raw (variance) form are shown for one-day 𝐴(𝜏) 𝐵(𝜏)′
𝑦𝜏𝑡+ℎ = + 𝑋𝑡+ℎ + 𝜀𝜏𝑡+ℎ , (18)
measures in Panel C, and for one-month (ℎ = 22) mea- 𝜏 𝜏
sures in Panel D, both in basis points. The sample spans 𝑋𝑡+ℎ = 𝐶ℎ + 𝐷ℎ′ 𝑋𝑡 + 𝜂𝑡+ℎ , (19)
the period from January 2, 2000, through December
31, 2020. where 𝜀𝜏𝑡+ℎ
∼ N(0, 𝐻𝜏,ℎ ), 𝜂𝑡+ℎ ∼ N(0, 𝑄𝑡,ℎ ), ℎ = 1 for daily data and daily
time index 𝑡, with expressions for 𝐴 and 𝐵 in Appendix C, and for 𝐶ℎ ,
𝐷ℎ , and 𝑄𝑡,ℎ in Appendix D. For the two Gaussian models in Table 1,
the standard linear filter applies. For the stochastic volatility models,
and Benzoni (2010) and Bollerslev et al. (2018). The risk measure over
𝜏 Cox et al. (1985) (henceforth CIR) and AFNS3 , we apply the extended
the next month is simply 𝑉𝑡+1→𝑡+ℎ , ℎ = 22, aggregating the pre-averaged
Kalman filter, approximating transitions by Gaussian distributions.10
realized variances over 22 trading days.
Upon estimation, conditional variance estimates are computed us-
Table 3 shows descriptive statistics on the daily annualized pre-
√ ing Eq. (7), now with ℎ indicating the forecasting horizon (ℎ = 22
averaged realized yield volatilities over one day, 𝑉𝑡𝜏 , in Panel A, and
√ for month-ahead forecasting), and corrected for the bias stemming
𝜏
one month, 𝑉𝑡+1→𝑡+22 , in Panel B, in percent, by maturity. Although from measurement error in yields, 𝜀𝜏𝑡+ℎ in Eq. (18), producing 𝑉̃𝑡,ℎ 𝜏 =
the volatility (square root) form of the measures facilitates interpre- 𝑏𝜏,ℎ
0
+ 𝑏𝜏,ℎ′
1
𝑋𝑡 + 𝐻𝜏,ℎ . Recursive volatility forecasts are constructed using
tation, e.g., a unit mean corresponds to a one percent annual yield 𝑉̃ for 𝑍𝑡 in Eq. (1).
𝜏
𝑡,ℎ
volatility, the raw (variance) form is used for forecasting, and descrip-
tive statistics for this are reported in Panels C and D. 5. Empirical results
The term structure of volatilities (variances) exhibits a hump shape
across the four maturities considered, with highest average at the 𝜏 = 5 We consider the forecasting of yield volatility over the next month.
year maturity. The standard deviation (time series variation in volatil- The first set of estimates is based on the period January 2, 2000,
ity) shows a similar pattern. Skewness and excess kurtosis are higher through December 31, 2007, and the next 100 observations are used
for shorter maturities, 𝜏 = 0.5 and 1, relative to longer, 𝜏 = 5 and 7,
where they essentially vanish for the one-month volatility measure.7
8 Our high-frequency yield curves are constructed using five futures con-
On average, volatilities over one month are slightly larger than over
one day, possibly reflecting that some conditional mean variation re- tacts, and four yields are retained in the resulting high-frequency panel, cf.
Section 3.3. The time series models are estimated using realized measures (Sec-
mains in the realized measures due to finite sampling frequency, cf.
tion 3.4) based on the high-frequency panel.
footnote 3, as well as Jensen’s inequality for the square root measures 9
Available at http://www.federalreserve.gov/pubs/feds/2006/200628/
200628abs.html and updated daily.
10 Appendix D provides further details on estimation, including the Kalman
7
Skewness and kurtosis are close to 0 and 3. filter and the (quasi) log likelihood function.
7
Fig. 2. Annualized pre-averaged realized yield volatility.

This figure shows the time series of daily annualized pre-averaged realized one-month yield volatilities in percent, by maturity, along with a three-dimensional view
of the evolution through time of the term structure of volatilities (log variances). The sample spans the period from January 2, 2000, through December 31, 2020.
for recursive estimation of the predictive regressions in Eq. (1). This ing a one-sided Diebold and Mariano (1995) test based on the Newey
leaves 2,976 observations for the out-of-sample period, covering June and West (1994) variance estimator with automatic lag selection of An-
7, 2008, through December 31, 2020. This way, the out-of-sample win- drews (1991).
dow starts just before the transition to the ZLB regime. For PCA, risk Table 4 presents the resulting 𝑅2𝑂𝑜𝑆 statistics in percent and Diebold-
premium based forecasters, and the HAR model, forecasts are not re- Mariano 𝑝-values for all forecasters and maturities considered. Only the
stricted to be positive. To ensure meaningful volatility forecasts, we HAR model generates positive 𝑅2𝑂𝑜𝑆 statistics across all maturities. Fur-
apply a sanity filter, such that we do not forecast below the 2.5 or above thermore, for each maturity, HAR generates the highest statistic (most
the 97.5 percentiles of the empirical distribution of observed realized accurate forecasts) across all models, except that the forward spread
variances.11 and the best common factor based forecaster get even higher statistics
at 𝜏 = 5. From the results, yield volatility forecasting is hardest at the
5.1. Statistical value of interest rate risk forecasts short and long ends of the curve, with only HAR generating positive
𝑅2𝑂𝑜𝑆 at 𝜏 = 0.5, 1, and 7. In contrast, at the intermediate maturity 𝜏 = 5,
To assess the yield volatility forecasts against a RW, we consider the 𝑅2𝑂𝑜𝑆 is positive for all models, except Vasicek.
𝑅2𝑂𝑜𝑆 measure of Campbell and Thompson (2007), A few comparisons within the first panel of Table 4, the results for
∑𝑇 −ℎ the term structure models, are worth noting. Among the four models
𝜏
𝑡=𝑡 +1 (𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝜉
𝜏 )2
considered, Vasicek shows the strongest forecasting performance at the
𝑅2𝑂𝑜𝑆 = 1 − ∑𝑇 −ℎ0 , (20)
𝜏
𝑡=𝑡0 +1 (𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝑅𝑊
𝜏 )2 two shortest maturities, and the weakest at the two longest. Introducing
stochastic volatility only leads to improved forecasts in half of the cases
where 𝑉̂𝑡,ℎ,𝜉
𝜏 is the forecast from model 𝜉, 𝑡0 is the end of the initial esti- considered, i.e., at the two longest maturities for the one-factor model,
mation period, and 𝑇 is the end of the sample. A positive 𝑅2𝑂𝑜𝑆 indicates and at the two shortest for the three-factor model.
more accurate volatility forecasts from model 𝜉 than from the RW. The Although constant-volatility models are typically rejected in-sample,
null hypothesis 𝑅2𝑂𝑜𝑆 ≤ 0 is tested against the alternative 𝑅2𝑂𝑜𝑆 > 0 us- parsimony can be rewarded in out-of-sample forecasting comparisons,
as it reduces parameter uncertainty and the risk of overfitting, albeit at
the expense of increased risk of model misspecification. In the present
11 The empirical distribution of observed variances is updated recursively, and case, the results suggest that parsimony is rewarded at shorter maturi-
forecasts below the 2.5 and above the 97.5 percentiles replaced by these. ties, with the single-factor homoskedastic model performing best, and
8
Table 4 out-of-sample hedging.12 In the fixed income case, updating is in the

Out-of-sample 𝑅2 for month-ahead yield volatility forecasting. spirit of the Heath et al. (1992) approach of conditioning on current
information, and Buraschi and Corielli (2005) find that it partly cor-
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
rects for model misspecification. Our results are consistent with these
Term structure models findings.
The next panel in Table 4 shows results for the PCA based fore-
Vasicek -15.67 -6.90 -0.84 -52.48
casters. Including more than three components actually makes forecasts
(0.88) (0.70) (0.53) (1.00)
deteriorate, presumably due to loss of parsimony. This is consistent with
CIR - 28.90 -18.55 11.91 -9.55
Litterman and Scheinkman (1991), who found that three factors provide
(0.96) (0.87) (0.19) (0.70)
a good description of the term structure. By a rotation argument, one
AFNS0 -52.53 -48.55 18.95 -14.20
might expect that the three factors in AFNS0 correspond closely to the
(1.00) (1.00) (0.06) (0.77)
three leading PCA factors, so the difference in performance is perhaps
AFNS3 -40.94 -28.43 4.69 -49.04 surprising, with the PCA method generating more accurate forecasts at
(0.99) (0.91) (0.37) (0.99) shorter maturities, and AFNS0 at longer. However, unlike the PCA based
PCA based forecasters volatility forecast, the affine model variance forecast does not involve
the latent factors in the Gaussian case, as we show in the Appendix.13
PCA3 -32.11 -24.35 15.92 -15.84 Presumably, this explains the difference in performance. Consistently
(0.92) (0.91) (0.04) (0.91) with the findings from the term structure models (first panel), the re-
PCA4 -32.28 -24.48 15.04 -16.67 sults for the three-factor case indicate rewards to parsimony (updating
(0.93) (0.92) (0.06) (0.92) of the constant-volatility model) at longer maturities, and to the use of
PCA5 -32.48 -23.68 12.72 -23.82 time-varying factors (AFNS3 , or PCA) at shorter.
(0.93) (0.91) (0.12) (0.94) For the common factor based forecasters in Table 4, fourth panel,
PCA6 -39.95 -30.39 10.84 -20.88 the label “Initial” indicates that the selection of PCA factors in Eq. (9)
(0.97) (0.98) (0.16) (0.85) is based on the initial estimation period, only, and “Recursive” that it
is updated every period. From the results, selection based on the initial
Risk premium based forecasters
estimation period generates more accurate forecasts than recursive up-
Forward spreads -28.51 -22.41 22.00 -12.92 dating. This finding points to a regime switch in which the ZLB regime
(0.88) (0.94) (0.01) (0.84) is more similar to the initial estimation period than to the transition pe-
Yield spreads -28.32 -21.57 8.84 -14.50 riod, with forecasts into the ZLB period deteriorating if the selection is
(0.87) (0.93) (0.26) (0.90) updated during the transition. Further, at each maturity, the common
factor based forecaster with initial selection is more accurate than the
Common factor based forecasters maturity-specific (i.e., not using a common factor based on Eq. (9)) PCA
Initial -27.66 -21.72 20.62 -2.25 based forecasters from the second panel. This indicates that volatilities
(0.88) (0.90) (0.02) (0.59)
at different maturities are driven by a common factor, and that averag-
Recursive -38.11 -34.45 17.96 -7.44
ing out maturity-specific noise improves forecasting accuracy. In fact,
the common factor based forecaster dominates the term structure mod-
(0.95) (0.98) (0.06) (0.68)
els (first panel) at the two longest maturities, as does the forward spread
Time series models based forecaster (third panel) at 𝜏 = 5, suggesting that the yield curve
HAR 1.03 7.54 20.44 10.70
contains predictive information about future interest rate risk beyond
that captured by the affine models.
(0.47) (0.29) (0.00) (0.11)
The last panel in Table 4 shows results for the time series models.
Mean-reverting realized GARCH -26.83 -20.50 0.38 -38.20
Across all maturities, the forecasting performance of the mean-reverting
(0.86) (0.93) (0.49) (1.00)
realized GARCH model falls short of HAR. Nevertheless, both models
This table displays 𝑅2𝑂𝑜𝑆
measures in percent relative to a RW for all fore- dominate the PCA, risk premium, and common factor approaches at the
casting methods and maturities. In parentheses asymptotic 𝑝-values for a two shortest maturities. The HAR model gets a 𝑝-value of 11% at the
one-sided Diebold-Mariano test using the Newey-West variance estimator longest maturity, and below 1% at 𝜏 = 5 years, meaning that it signifi-
with automatic lag selection of Andrews (1991). For common factor based cantly outperforms the RW at level 1% at this intermediate maturity. So
forecasters, the label “Initial” indicates that the selection of PCA factors is does the forward spread based forecast at 1%, the PCA based forecast
based on the initial estimation period, and “Recursive” that it is updated
with three components and the common factor based forecast with-
every period. The initial estimation period ranges from January 2, 2000,
out updating at 5%, and AFNS0 and the recursive common factor based
through June 6, 2008, and the out-of-sample period from June 7, 2008,
through December 31, 2020. forecast at 10%. All PCA based forecasts (three through six components)
get 𝑝-values below 20% at 𝜏 = 5. The results point to difficulties for any
model to reliably outperform the RW at short and long maturities, but
also to scope for accurate volatility forecasting across a variety of dif-
to some extent at longer maturities, too, with rewards to using either
ferent methods at intermediate maturities.
a single factor (CIR), or constant volatility (AFNS0 ). Further, the CIR
For comparison, Table A.3 in the Appendix shows results for volatil-
model strikes a better balance between parsimony and misspecification
ity forecasts two months ahead (ℎ = 44), laid out in the same manner as
than AFNS3 , as it dominates this across all maturities.
Table 4. Although significance is lost in most cases, presumably due to
Based on the results from the term structure models, the general
higher noise-to-signal ratio at the longer forecasting horizon, the over-
economic message is that complex time-varying models are hard to
all pattern is confirmed. For each maturity, HAR generates the highest
estimate, to the point that assuming and updating a simple, possi-
bly misspecified model can work better in practice, for purposes of
forecasting yield volatility. The similar phenomenon is known from 12
Similarly, Bakshi et al. (1997) find that the constant-volatility model per-
the options literature, e.g., Dumas et al. (1998) find that while state- forms no worse than stochastic volatility extensions in terms of hedging across
dependent volatility models improve on the Black and Scholes (1973) most moneyness categories.
model in-sample, updating the constant-volatility model is preferred for 13
For 𝑚 = 0, we have 𝑏𝜏,ℎ
1
= 0 in Eq. (7), see Eq. (D.28) in Appendix D.3.
9
Fig. 3. Yield volatility forecast errors relative to RW.

This figure shows yield volatility forecast errors over time, by maturity, for selected forecasting methods. Cumulative squared errors for a given forecasting method
are subtracted from those for the RW, so that an increasing curve indicates better forecasting than by RW, and vice versa. The sample spans the period from
January 2, 2000, through December 31, 2020.
𝑅2𝑂𝑜𝑆 statistic (most accurate forecasts) across all models considered, vestor is not ignoring recent volatilities, cf. Eq. (1). Historical volatility
and at 𝜏 = 5, the forward spread based forecaster and AFNS0 get close. (HAR) forecasts better than the curve-based methods, except that Va-
Introducing stochastic volatility leads to improved forecasts at 𝜏 = 5 for sicek initially does better at the short end. Improvements relative to
the one-factor model, and at the two shortest maturities for the three- the RW accumulate over calendar time at 𝜏 = 5, except for the Vasicek
factor model. Although AFNS3 now performs better than CIR at three model. At other maturities, results stabilize over the ZLB period, which
maturities, Vasicek continues to dominate the other term structure mod- runs through 2016 (cf. Fig. 1), well after the end of quantitative easing
els at the two shortest. Including more than three components in the by 2014, then deteriorate for most models, HAR being the exception.
maturity-specific PCA based forecaster makes forecasts deteriorate, and Overall, the results show that information from either the yield
initial selection of factors generally generates more accurate forecasts curve or the time series of volatilities can be used to improve volatility
than recursive updating for the common factor approach. The cross sec- forecasts over the RW. Furthermore, the results indicate that the yield
tion based (PCA, risk premium, common factor) forecasters generally curve contains information about future volatility that is not captured
perform better than the term structure models, hence pointing to pre- by the term structure models.
dictive information in the curve not captured by the affine models. One
difference in results is that significance relative to RW is now attained 5.2. Extracting incremental information
at the shortest maturity for HAR, at level 10%. Henceforth, we focus on
the one-month forecasting horizon (ℎ = 22). Here, we examine the possibility that the yield volatility forecasting
Fig. 3 shows yield volatility forecast errors over time, by matu- equations based on the affine term structure models suffer from omitted
rity, for selected forecasting methods. Cumulative squared errors for variable bias, i.e., that the expression for future volatility in Eq. (7)
a given forecasting method are subtracted from those for the RW, so should be expanded to
that an increasing curve indicates better forecasting than by RW, and 𝜏
𝑉𝑡,ℎ = 𝑏𝜏,ℎ + 𝑏𝜏,ℎ′ 𝑋𝑡 + 𝑏𝜏,ℎ′ 𝑋̃ 𝑡 , (21)
vice versa. In the common factor case, the improvement in the approach 0 1 2
with initial selection relative to recursive updating occurs around 2009, for forecasting purposes. We consider three ways of extracting the in-
consistent with the ZLB transition driving the phenomenon. For com- cremental information variable 𝑋̃ 𝑡 :
parison, the figure shows results for Vasicek, CIR, and HAR, too. Most (i) By PCA factors from the yield curve.
methods exhibit jumps in forecasting accuracy, moving into the ZLB (ii) Combining (i) with a factor extracted from the past volatility fore-
regime, although the effect materializes more gradually at longer ma- cast errors.
turities. It could be expected that CIR (with state-dependent volatility) (iii) By the realized volatility measure based on high-frequency data.
would handle the entire ZLB period better than Vasicek, but this is not Since PCA factors capture the shape of the yield curve nonparametri-
confirmed. At the shorter maturities, Vasicek quickly adjusts to the low- cally, improved forecasting performance (here, significance of 𝑏𝜏,ℎ 2
) in
volatility regime, whereas CIR performs worse than the RW for about case (i) indicates that the yield curve contains incremental information
two years. The results show that simpler is better around the transi- about future interest rate risk relative to that in the affine term struc-
tion to ZLB (initial selection as opposed to recursive, constant-volatility ture model considered. In (ii), significance indicates the existence of
model as opposed to CIR), presumably in part because the artificial in- a factor explaining yield volatilities, but not yields, i.e., a USV case. In
10
Table 5 Table 6
The incremental information on future volatility in the yield curve. Specification test for unspanned stochastic volatility factor.
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7 𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
Term structure models Term structure models
Vasicek 7.31 1.14 6.39 18.20 Vasicek -3.01 -1.65 -1.90 -2.55
(0.06) (0.77) (0.09) (0.00) (0.00) (0.10) (0.06) (0.01)
CIR 2.92 4.93 11.85 18.34 CIR -9.08 -4.49 -2.43 -3.36
(0.40) (0.18) (0.01) (0.00) (0.00) (0.00) (0.02) (0.00)
AFNS0 31.18 17.14 0.75 8.49 AFNS0 -5.85 -2.91 -1.12 -2.31
(0.00) (0.00) (0.86) (0.04) (0.00) (0.00) (0.26) (0.02)
AFNS3 4.16 6.97 24.56 82.27 AFNS3 -8.18 -4.23 -2.09 -1.57
(0.24) (0.07) (0.00) (0.00) (0.00) (0.00) (0.04) (0.12)
Time series models Time series models
HAR 5.08 5.41 4.48 18.57 HAR -1.35 -0.79 2.48 2.49
(0.17) (0.14) (0.21) (0.00) (0.18) (0.43) (0.01) (0.01)
Mean-reverting realized GARCH 7.81 8.27 6.60 23.73 Mean-reverting realized GARCH -5.54 -2.52 -1.43 -2.30
(0.05) (0.04) (0.09) (0.00) (0.00) (0.01) (0.15) (0.02)
This table shows results from regression Eq. (22), explaining yield This table shows results from regression Eq. (23), explaining yield
volatility forecast errors from the specified models using three PCA fac- volatility forecast errors from the specified models augmented with
tors fitted to past yield curves, based on Eq. (8). Reported values are three yield curve PCA factors as in Eq. (22) using a PCA factor fitted
𝐹 -statistics for joint tests of 𝜙𝜏,ℎ
1
= 0 (three coefficients). Asymptotic to past yield volatility forecast errors. Reported values are 𝑡-statistics
𝑝-values in parentheses. The initial estimation period ranges from Jan- for testing 𝜓1𝜏,ℎ = 0. Asymptotic 𝑝-values in parentheses, based on the
uary 2, 2000, through June 6, 2008, and the out-of-sample period from Newey-West variance estimator with automatic lag selection of An-
June 7, 2008, through December 31, 2020. drews (1991). The initial estimation period ranges from January 2,
2000, through June 6, 2008, and the out-of-sample period from June 7,
2008, through December 31, 2020.
other words, significance of 𝑏𝜏,ℎ2
shows in case (i) that the term structure
models do not capture all relevant information about future volatility volatility beyond that captured by the affine models, especially at the
available from the yield curve, and in case (ii) that the yield curve itself long end.
does not capture all relevant information. In (iii), significance indicates For the time series models, the evidence suggests that the yield curve
that historical volatility contains incremental information not captured contains incremental information about future volatility at the long end,
by the term structure models. relative to that contained in the historical volatility series. The null is
In addition, we subject the time series models to the same tests, i.e., rejected at 1% for 𝜏 = 7 for both the HAR and realized GARCH models,
we examine whether the similar information variable 𝑋̃ 𝑡 can be used to and for the latter at 10% for the other maturities, too.
improve the volatility forecasts from the time series models. If so, this Taken together, the results indicate that the yield curve contains
indicates in (i) that the yield curve contains incremental information important incremental information about future volatility relative to
about future volatility, relative to that contained in the volatility history the time series of historical volatilities at the long end, and that some
itself, and in (iii) that the time series models do not fully exploit the of this information is not captured by standard affine models.
information in the high-frequency data.
5.2.2. Can past forecast errors predict future forecast errors?
5.2.1. The incremental information in the yield curve In (ii), we examine whether a factor extracted from past yield
For (i), we regress volatility forecast errors from model 𝜉 on PCA volatility forecast errors can predict future forecast errors. An impli-
factors from the yield curve, cation of USV is that it should be possible to extract at least one factor
from volatility which is not related to the yields. We investigate this
𝜏
𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝜉
𝜏
= 𝜙𝜏,ℎ
0
+ 𝜙𝜏,ℎ′
1
𝐹̂𝑡 + 𝑢𝜏,ℎ
𝑡+ℎ
, (22) possibility by recursively extracting a factor, say, 𝑃 𝐶𝑡,𝜉ℎ , from the cross
𝜏 − 𝑉̂ 𝜏′
′ ′
for fixed 𝜏, ℎ, with 𝐹̂𝑡 the three leading fitted PCA factors at time 𝑡 from section of current and past forecast errors 𝑉 ′ ′𝑡 +1→𝑡 +ℎ
by applying
𝑡 ,ℎ,𝜉
Eq. (8), based on data through 𝑡, and test for joint significance of 𝜙𝜏,ℎ PCA to forecast errors at 𝑡′ = 𝑡 − ℎ and earlier, across maturities 𝜏 ′ , using
1
(𝑘 = 3 coefficients). Under the null, the model generating the forecast PCA, then testing for whether the extracted factor contains significant
𝑉̂𝑡,ℎ,𝜉
𝜏 subsumes the information content on future volatility available in information about future forecast errors in the regression
the PCA yield curve factors. 𝜏
𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝜉
𝜏
= 𝜙𝜏,ℎ + 𝜙𝜏,ℎ′ 𝐹̂𝑡 + 𝜓1𝜏,ℎ 𝑃 𝐶𝑡,𝜉
ℎ
+ 𝑢𝜏,ℎ
𝑡+ℎ
, (23)
Table 5 presents results from the specification test. In general, the 0 1
term structure models do not capture all relevant information in the for fixed 𝜏, ℎ. Only one factor is included, due to the small number of
PCA yield curve factors about future volatility. The null is rejected at volatilities in the cross section. The first 100 observations are used to
level 10% or better at two or more maturities for each term structure initialize the factor. A significant coefficient 𝜓1𝜏,ℎ indicates that a serially
model, and for two or more models at each maturity. It is rejected at dependent USV factor is relevant for the forecast.
5% or better for all models at the longest maturity, and at 10% or bet- Table 6 shows results from estimation of Eq. (23). At 5%, evidence
ter at 𝜏 = 5, except that here, AFNS0 captures the information in the of an omitted factor arises at two or more maturities for each term
curve about future volatility (𝑝-value 0.86), consistent with the rela- structure model, and for two or more models at each maturity. The co-
tively strong performance of this model at the intermediate maturity in efficients 𝜓1𝜏,ℎ are negative in all cases, consistent with mean-reversion
Table 4. However, the PCA factors carry incremental information rela- in the USV factor. The test is conservative, in that the specification in
tive to AFNS0 at level 5% for all other maturities. Thus, the indication Eq. (23) controls for PCA factors fitted to past yield curves, so the re-
is that the yield curve contains incremental information about future sults are indicative of the existence of a latent volatility factor which is
11
Table 7 5.3. Economic value of interest rate risk forecasts

The incremental information on future yield volatility in historical
volatility. So far, the analysis has focused on statistical measures of predictive
ability. We next examine whether the volatility forecasting methods
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
considered generate utility in a portfolio allocation framework, follow-
Term structure models ing Bollerslev et al. (2018). To this end, the analysis is translated from
the level of yields to returns. We consider an investor purchasing a zero-
Vasicek 4.76 7.81 3.19 5.89
coupon bond of maturity 𝜏 + ℎ at time 𝑡 and selling the bond at 𝑡 + ℎ.
(0.00) (0.00) (0.00) (0.00)
Let 𝑟̃𝑡+ℎ denote the log return from this trading strategy. The relation
CIR 3.22 4.18 1.57 2.13
between returns and yields is
(0.00) (0.00) (0.12) (0.03)
AFNS0 2.15 2.22 1.86 1.80 𝑟̃𝑡+ℎ = −𝜏𝑦𝜏𝑡+ℎ + (𝜏 + ℎ)𝑦𝜏+ℎ
𝑡 , (24)
(0.03) (0.03) (0.06) (0.07)
so the conditional variance of the return as of 𝑡 is given by
AFNS3 2.01 2.61 1.23 2.28
(0.04) (0.01) (0.22) (0.02) 𝑉 𝑎𝑟𝑡 (̃𝑟𝑡+ℎ ) = 𝜏 2 𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) , (25)
Time series models
i.e., depending on maturity and the yield volatility forecast we con-
HAR 1.41 4.45 0.71 1.72 sider. Investor is assumed to have mean-variance preferences and access
(0.16) (0.00) (0.48) (0.09) to a risk-free as well as a risky asset, the latter being the zero-coupon
Mean-reverting realized GARCH 5.34 10.00 4.23 4.75
bond with time-varying volatility. Assuming a constant Sharpe ratio,
investor’s utility depends only on the variance of the risky asset. We
(0.00) (0.00) (0.00) (0.00)
consider maturities 7, 13, 61, and 85 months, so that the analysis de-
This table shows results from the regression Eq. (22), using pre- pends on the forecasts from the previous sections.
averaged realized volatility from high-frequency data as predictor, to Let 𝑤𝑡 be the portfolio weight allocated to the risky bond and 1 − 𝑤𝑡
test for whether historical volatility contains incremental information the allocation to the risk-free asset with return 𝑟𝑓𝑡 . The return to the
about future volatility, relative to the specified models. Reported values
portfolio at time 𝑡 + ℎ is
are 𝑡-statistics for testing 𝜙𝜏,ℎ
1
= 0. Asymptotic 𝑝-values in parentheses,
based on the Newey-West variance estimator with automatic lag se-
𝑟𝑡+ℎ = 𝑟𝑓𝑡 + 𝑤𝑡 𝑟𝑥𝑡+ℎ , (26)
lection of Andrews (1991). The initial estimation period ranges from
January 2, 2000, through June 6, 2008, and the out-of-sample period where 𝑟𝑥𝑡+ℎ = 𝑟̃𝑡+ℎ − 𝑟𝑓𝑡
is the excess return to the risky asset. Since 𝑟𝑓𝑡
from June 7, 2008, through December 31, 2020. is common across all forecasters, we only consider utility in terms of
excess return. Expected utility per unit of wealth is given by
1
U𝑡+ℎ = 𝑤𝑡 𝔼(𝑟𝑥𝑡+ℎ ) − 𝛾𝑤2𝑡 𝑉 𝑎𝑟𝑡 (𝑟𝑥𝑡+ℎ ) , (27)
spanned neither by the parametric term structure models, nor nonpara- 2
metrically by the yield curve. From the last panel, the latent factor has where 𝛾 is relative risk aversion, and 𝑉 𝑎𝑟𝑡 (𝑟𝑥𝑡+ℎ ) = 𝑉 𝑎𝑟𝑡 (̃𝑟𝑡+ℎ ) from
significant explanatory power at 5% for future forecast errors at two or Eq. (25). The optimal weight 𝑤∗𝑡 is then
more maturities for each of the time series models, as well.
1 𝔼𝑡 (𝑟𝑥𝑡+ℎ )
𝑤∗𝑡 = , (28)
𝛾 𝑉 𝑎𝑟𝑡 (𝑟𝑥𝑡+ℎ )
5.2.3. The incremental information in historical volatility √
which, given the constant Sharpe ratio, 𝑆𝑅 = 𝔼𝑡 (𝑟𝑥𝑡+ℎ )∕ 𝑉 𝑎𝑟𝑡 (𝑟𝑥𝑡+ℎ ),
In (iii), we investigate whether historical volatility contains infor- becomes
mation about future volatility not captured by the models considered,
using a similar regression as in Section 5.2.1, but with the lagged real- 1 𝑆𝑅
𝑤∗𝑡 = √ . (29)
ized measure based on high-frequency data as predictor (see Andersen 𝛾 𝑉 𝑎𝑟𝑡 (𝑟𝑥𝑡+ℎ )
and Benzoni (2010) for a related analysis). The specification is Eq. (22), By Eqs. (26) and (29), the volatility target sought by√the investor is
with 𝑉𝑡𝜏 replacing 𝐹̂𝑡 , and we test for 𝜙𝜏,ℎ = 0. √
1 𝑤∗𝑡 𝑉 𝑎𝑟𝑡 (𝑟𝑥𝑡+ℎ ) = 𝑆𝑅∕𝛾. If the forecasted volatility 𝜏 𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) ex-
Results appear in Table 7. According to this test, historical volatility
ceeds this target, investor will place only a portion of wealth in the
carries significant incremental information at 5% at two or more matu-
risky asset, 𝑤𝑡 < 1, and save the remainder in the risk-free asset. Con-
rities, relative to each of the term structure models. Similarly, historical
versely, when the forecasted volatility falls short of the target, investor
volatility contains significant incremental information relative to three
will take a geared position in the risky asset, 𝑤𝑡 > 1, financed by bor-
or more of the four term structure models at each maturity beside the
rowing at the risk-free rate. Thus, investor follows a volatility timing
intermediate 𝜏 = 5, where the test is only significant for Vasicek. At
strategy.
𝜏 = 1, historical volatility carries incremental information relative to all To examine the utility gains from the various forecasters, let 𝔼𝑡 (⋅)
models, including the time series models, possibly due to recursive esti- denote the conditional expected value from the true model, 𝔼𝜉𝑡 (⋅) the
mation and regime switching (to ZLB). HAR subsumes the information conditional expected value from model 𝜉, and similarly for conditional
content in historical volatility at all other maturities, whereas historical variances. Assuming that investor uses model 𝜉 for portfolio selection,
volatility carries incremental information relative to realized GARCH ( √ )
the optimal weight becomes 𝑤∗𝑡,𝜉 = 𝑆𝑅∕ 𝛾𝜏 𝑉 𝑎𝑟𝜉𝑡 (𝑦𝜏𝑡+ℎ ) . The expected
across all maturities.
Overall, our specification tests reveal that the term structure of inter- utility per unit of wealth can then be expressed as
est rates contains incremental cross-sectional information about future √
⎛ 𝑉 𝑎𝑟 (𝑦𝜏 ) 𝜏 ⎞
risk at the long end, relative to the time series of historical volatilities. 𝑆𝑅2 ⎜ 𝑡 𝑡+ℎ 1 𝑉 𝑎𝑟𝑡 (𝑦𝑡+ℎ ) ⎟
U𝑡+ℎ = √ . (30)
𝛾 ⎜⎜ 2 𝑉 𝑎𝑟𝜉 (𝑦𝜏 ) ⎟⎟
−
Some of the incremental information in the term structure is not cap- 𝜉 𝜏
⎝ 𝑉 𝑎𝑟 (𝑦
𝑡 𝑡+ℎ ) 𝑡 𝑡+ℎ
⎠
tured by the affine models, and historical volatility carries incremental
information relative to these, and to the yield curve itself. The latter If a model is able to perfectly predict conditional variance, then in-
phenomenon can be related to the existence of a USV factor, and we vestor’s expected utility is 𝑆𝑅2 ∕(2𝛾), otherwise it is less. Following
find evidence of this in the data. Bollerslev et al. (2018), we set the annualized Sharpe ratio to 0.4. The
12
Table 8 Table 9
Utility from yield volatility forecasts. Yield volatility forecasts using nonlinearities in yield curve factors.
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7 𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
Term structure models Maturity-specific, initial -43.25 -38.33 6.07 -13.21

(0.98) (0.99) (0.28) (0.73)
Vasicek -3.18 -6.25 0.56 1.15
Maturity-specific, recursive -28.94 -5.23 4.35 -8.62
(0.97) (0.89) (0.04) (0.61)
(0.90) (0.67) (0.35) (0.72)
CIR 0.50 1.08 0.89 1.18
Common factor, initial -26.31 -17.87 16.15 -17.74
(0.75) (0.41) (0.02) (0.53)
(0.88) (0.89) (0.12) (0.82)
AFNS0 0.95 0.58 1.32 1.38
Common factor, recursive -23.45 -15.03 17.28 -20.59
(0.13) (1.00) (0.01) (0.07)
(0.87) (0.85) (0.05) (0.93)
AFNS3 1.17 0.75 0.95 1.11
(0.00) (0.89) (0.02) (0.67) This table displays 𝑅2𝑂𝑜𝑆 measures in percent relative to a RW for yield
volatility forecasts allowing linear and nonlinear terms as well as in-
PCA based forecasters
teractions in the three leading yield curve factors extracted by PCA in
PCA3 -1.38 -12.27 1.06 1.04 Eq. (8). The optimal combination of terms is selected by BIC. The la-
bel “initial” indicates that the selection is based on the initial estimation
(0.91) (0.89) (0.02) (0.79)
period, and “recursive” that it is updated every period. In parenthe-
PCA4 -1.54 -11.92 0.51 1.06
ses asymptotic 𝑝-values for a one-sided Diebold-Mariano test using the
(0.91) (0.90) (0.04) (0.77) Newey-West variance estimator with automatic lag selection of Andrews
PCA5 -1.34 -11.93 0.26 1.06 (1991). The initial estimation period ranges from January 2, 2000,
(0.91) (0.90) (0.06) (0.78) through June 6, 2008, and the out-of-sample period from June 7, 2008,
PCA6 0.20 -6.65 0.23 1.19 through December 31, 2020.
(0.92) (0.96) (0.06) (0.49)
Risk premium based forecasters

icantly over the naive forecast at level 5%, except that AFNS0 gets a
𝑝-value of 0.07 at 𝜏 = 7. The HAR model generates high utility at the
Forward Spread 0.86 0.58 1.28 1.07 two shortest maturities, as well, along with AFNS3 at 𝜏 = 0.5 (both are
(0.30) (1.00) (0.02) (0.74) significant at 1%). The PCA and common factor based forecasters are
Yield Spread 0.85 0.57 1.31 1.41 not quite on par with the term structure models, risk premium based
(0.32) (1.00) (0.01) (0.03) forecasters, and time series models.
Overall, the analysis shows that the information about future rein-
Common factors based forecasters
vestment rate risk contained in both the yield curve (yield spread,
Static 0.02 -4.52 0.69 1.31 affine models) and historical volatility (HAR, mean-reverting realized
(0.94) (0.90) (0.04) (0.16) GARCH) provides economic value to a risk averse investor, both in
Recursively -1.51 -12.16 0.34 1.25 short and long maturity instruments. There are clear trade-offs, with
(0.93) (0.89) (0.06) (0.33) the affine models, the yield spread, and realized GARCH performing
relatively better on utility grounds, and the forward spread in terms of
Time series models
accuracy, along with the PCA based forecaster using three components.
HAR 1.17 0.89 1.32 1.45 The HAR model performs relatively well according to both criterions,
(0.01) (0.93) (0.01) (0.01) forecasting accuracy and utility.
Mean-reverting realized GARCH 0.88 0.62 1.30 1.39
(0.27) (1.00) (0.01) (0.04) 6. Robustness
RW 0.71 1.05 -3.32 1.19
(-) (-) (-) (-) This section addresses the robustness of the results in Section 5.
First, we consider whether interest rate risk forecasts are improved by
This table presents average realized utility from using the specified yield
including nonlinear terms in PCA factors. Next, we examine whether
volatility forecasters for portfolio allocation. In parentheses asymptotic
there is a trade-off between explaining yields and forecasting volatility,
𝑝-values for a one-sided Diebold-Mariano test relative to a RW using the
Newey-West variance estimator with automatic lag selection of Andrews
and we assess the information about future yield volatility in a wide
(1991). A portfolio consisting of the risk-free asset generates utility zero. cross section of coupon bond prices. Finally, we explore the possibility
The initial estimation period ranges from January 2, 2000, through of enhancing forecasting performance by combining cross-sectional and
June 6, 2008, and the out-of-sample period from June 7, 2008, through intertemporal information.
December 31, 2020.
6.1. Nonlinearities and interest rate risk
risk aversion parameter, 𝛾, is set to 5, following Sarno et al. (2016) and
Gargano et al. (2017) from the bond return prediction literature. This Feldhütter et al. (2016) find that including nonlinearities via
corresponds to an annualized volatility target of 8%. quadratic and cubic terms as well as interactions in factors increases
To evaluate predictors, we simply average realized utility. Table 8 in-sample explanatory power for realized variance. In our out-of-sample
reports the results. They deviate somewhat from those on prediction framework, we find in Table 4 that using the three leading factors 𝐹̂𝑡
accuracy, Table 4, presumably because 𝑅2𝑂𝑜𝑆 is symmetric, penalizing extracted from the yields at time 𝑡 based on Eq. (8) suffices for the PCA
over- and underprediction equally, whereas the utility criterion penal- based forecasts, i.e., adding factors beyond three does not increase ac-
izes underprediction more heavily. At the 𝜏 = 5 year maturity, investor curacy. Here, we examine whether including nonlinearities in the first
is better off ignoring the RW forecast and investing in the risk-free asset, three PCA factors improves forecasts, both for the common factor ap-
which generates utility zero. Highest utility at the two longest maturi- proach and the maturity-specific forecasts. As in the linear common
ties is achieved using either AFNS0 , the yield spread based forecaster, factor approach, Eq. (9), the selection of terms to include is based on
HAR, or mean-reverting realized GARCH, all of which improve signif- the BIC, either for the initial estimation period, or updated recursively.
13
Results appear in Table 9. Recursively updating the selection of the Table 10

nonlinear terms generates more accurate forecasts than selecting based Trade-off between explaining yields and forecasting
on the initial estimation period, only, except at the longest maturity volatility.
for the common factor approach, and at 𝜏 = 5 for the maturity-specific 𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
forecasts.14 The common factor based forecasts are only more precise
than the maturity-specific at two maturities, in contrast to all maturities PCA456 -19.01 -13.69 4.22 -41.89
in the linear case, Table 4. Comparing Tables 4 and 9, including non- (0.86) (0.85) (0.38) (0.97)
linear terms improves the maturity specific forecasts at all maturities, PCA356 -22.31 -17.47 17.48 -8.88
except 𝜏 = 5, and the common factor based forecasts at the two shorter (0.88) (0.88) (0.07) (0.69)
horizons. PCA346 -26.34 -20.44 15.60 -6.01
Overall, the best forecast across both the maturity-specific and com- (0.91) (0.93) (0.10) (0.66)
mon factor approaches including nonlinearities improves on the linear PCA345 -26.03 -18.65 17.39 -8.17
forecasts for the two shortest maturities, not for the two longest. In (0.88) (0.87) (0.08) (0.72)
short, allowing for nonlinearities does not generate an overall improve- PCA256 -19.85 -14.31 18.22 -9.91
ment in yield volatility forecasting.
(0.86) (0.85) (0.04) (0.75)
PCA246 -18.78 -15.91 15.70 -4.57
6.2. Trade-off between explaining yields and forecasting volatility
(0.86) (0.88) (0.07) (0.68)
PCA245 -23.85 -18.36 21.53 -18.82
The finding in Table 4 that using the first three PCA factors 𝐹̂𝑡 from
(0.85) (0.90) (0.01) (0.94)
Eq. (8) generates more accurate yield volatility forecasts than using the
PCA236 -23.30 -20.03 16.18 -0.05
first four, five, or six factors raises the additional question of whether
(0.89) (0.91) (0.05) (0.50)
using exactly these three is optimal, or whether some of the remaining
factors beyond the first three are more informative about future yield PCA235 -27.27 -20.22 20.72 -12.4
volatility. Further, if volatility forecasts are indeed improved by using (0.88) (0.88) (0.01) (0.87)
some other combination of factors, the question arises whether the op- PCA234 -27.48 -20.92 18.82 -12.21
timal combination is common across maturities. (0.90) (0.90) (0.02) (0.86)
To address these issues, we consider a regression of the type in PCA156 -24.31 -17.03 6.14 -36.16
Eq. (1), with 𝑍𝑡 representing a combination of three of the first six PCA (0.90) (0.88) (0.35) (0.92)
factors, i.e., not necessarily the first three. We consider all 20 possible PCA146 -24.20 -18.35 6.41 -26.83
combinations. (0.91) (0.91) (0.32) (0.93)
Table 10 presents the 𝑅2𝑂𝑜𝑆 results. Strongest overall forecasting per- PCA145 -27.88 -20.19 6.59 -34.79
formance is obtained at the intermediate maturity 𝜏 = 5 years. This (0.89) (0.91) (0.34) (0.97)
echoes the result from Table 4, where forecasting based on the first PCA136 -29.18 -24.92 17.29 -0.53
three factors, 𝐹̂𝑡 = PCA123 , say, improves significantly over the RW at (0.93) (0.95) (0.08) (0.51)
level 5% for 𝜏 = 5. From Table 10, nine of the 20 factor combinations
PCA135 -31.09 -23.02 16.80 -5.82
improve significantly over the RW at 𝜏 = 5, and of these, the first three
(0.91) (0.90) (0.11) (0.64)
factors (bottom row of table) generate the lowest 𝑅2𝑂𝑜𝑆 statistic, along
PCA134 -31.68 -23.86 17.38 -2.42
with PCA126 . The best combination (highest statistic) is PCA245 , i.e.,
(0.92) (0.91) (0.08) (0.58)
combining the second, fourth, and fifth PCA factors from Eq. (8) for
PCA126 -24.15 -19.05 15.91 -3.71
purposes of yield volatility forecasting. The second factor, slope, enters
all of the nine significant combinations, and so proves important for (0.90) (0.90) (0.04) (0.61)
volatility forecasting at 𝜏 = 5.15 PCA125 -29.97 -21.76 18.57 -21.63

Although significant, the 𝑅2𝑂𝑜𝑆 statistic generated by the standard (0.89) (0.92) (0.05) (0.91)
PCA123 combination only ranks 13 in magnitude out of the 20 cases PCA124 -28.09 -21.89 19.21 -15.59
at 𝜏 = 5, i.e., in the lower half. It ranks last at the short end, 𝜏 = 0.5, (0.89) (0.92) (0.02) (0.89)
second to last at 𝜏 = 1, and again 13 at the long end, 𝜏 = 7. At 𝜏 = 1, the PCA123 -32.11 -24.35 15.92 -15.84
best combination is PCA456 , i.e., nonoverlapping with the three standard (0.92) (0.91) (0.04) (0.91)
factors. The sixth factor, PCA6 , say, enters the six best combinations,
This table displays 𝑅2𝑂𝑜𝑆 measures in percent rela-
and slope, PCA2 , only two of these. At 𝜏 = 0.5, the best combination
tive to a RW for all combinations of three of the
is PCA246 . The sixth factor, PCA6 , enters the five best combinations,
first six PCA yield factors from Eq. (8). In paren-
and PCA2 and PCA5 each three of these. At 𝜏 = 7, PCA236 is the best theses asymptotic 𝑝-values for a one-sided Diebold-
combination, PCA6 enters four of the five best combinations, and PCA2 Mariano test using the Newey-West variance estima-
and PCA3 each three of these. tor with automatic lag selection of Andrews (1991).
Summing up, the evidence is clearly that using the three leading The initial estimation period ranges from January 2,
principal components is not optimal for purposes of forecasting yield 2000 to June 6, 2008, and the out-of-sample period
volatilities. The combination improves over the RW at the intermediate from June 7, 2008, through December 31, 2020.
maturity 𝜏 = 5 years, but it is in the lower half of factor combinations
in terms of forecasting accuracy at each maturity. Further, the optimal
combination of factors for volatility forecasting is not common across best. Higher-order factors, the fourth through sixth principal compo-
maturities. The slope factor is important at 𝜏 = 5, and enters the best nents, enter the best combinations for volatility forecasting purposes at
combination at all maturities except 𝜏 = 1, where it enters the second all maturities. The sixth factor is particularly important. It enters the
best combination at all maturities except 𝜏 = 5, and the five best combi-
nations at each of the two shortest maturities.
14
In contrast, initial selection dominates updating for the linear common fac- The deterioration in performance by adding factors beyond the lead-
tor based forecasts in Table 4. ing three in Table 4 indicates the importance of parsimony. Although
15
The final combination involving slope, PCA246 , gets 𝑝-value 0.07. higher-order factors are informative about volatility, this is only re-
14
vealed by including these in parsimonious combinations, in our case Table 11

using three factors at a time. The results indicate a trade-off between Yield volatility forecasts from cross section of
explaining yield levels and forecasting interest rate risk. The three lead- coupon bond prices.
ing principal components from the yield panel are known to capture 𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
current yield curves well, but the fourth and higher components are
important for forecasting yield volatilities. CIRℚ -26.64 -16.21 10.58 -33.03
(0.98) (0.90) (0.03) (1.00)
6.3. Interest rate risk forecasts based on a cross section of coupon bond CIRℙ -30.20 -20.13 10.60 -33.92
prices (0.99) (0.94) (0.03) (1.00)
This table displays 𝑅2𝑂𝑜𝑆 measures in percent rel-

The estimation of our models relies on panel data on yields con-
ative to a RW for the CIR model estimated on a
structed from high-frequency futures prices, with five price observations
cross section of coupon bonds. CIRℚ indicates that
in the cross section dimension, for the time series methods, and on the forecasting is based on cross-sectionally estimated
daily panel with 𝑁 = 8 yields in the cross section for the curve-based ℚ-parameters and short rates, and CIRℙ that the
methods. Here, we investigate the possibility that the full cross section market price of risk is estimated based on the ℚ-
of observed coupon bond prices at any point in time carries more infor- parameters, the ten most recent fitted short rates,
mation about future interest rate risk. We consider daily CRSP data on and Euler discretization. In parentheses asymp-
quoted prices of bonds with maturities between 3 months and 10 years, totic 𝑝-values for a one-sided Diebold-Mariano test
adjusted for accrued interest, and excluding callable issues. The price using the Newey-West variance estimator with au-
of a bond promising 𝑀 semi-annual payments of 𝐶∕2 with maturities tomatic lag selection of Andrews (1991). The ini-
𝜏 = (𝜏1 , … , 𝜏𝑀 ) is represented as tial estimation period ranges from January 2, 2000
to June 6, 2008, and the out-of-sample period from
𝑀
𝐶 𝜏1 𝜏1 ∑ 𝐶 𝜏𝑗 𝜏
June 7, 2008, through December 31, 2020.
𝑃𝑡 (𝜏, 𝐶) = 𝐵 + 𝐵 + 100𝐵𝑡 𝑀 , (31)
2 1∕2 𝑡 𝑗=2
2 𝑡
𝜏 𝜏
we estimate 𝜆 by QML. For forecasts at 𝑡, we use ℚ-parameters from
with 𝐵𝑡 𝑗 = exp(−𝜏𝑗 𝑦𝑡 𝑗 ) the price of a zero-coupon bond of maturity 𝜏𝑗 , Eq. (33) at 𝑡, along with 𝜆 estimated from a short time series 𝑡 − 10, … , 𝑡
and 𝜏1 ∕(1∕2) the fraction of the first coupon receivable, reflecting the in Eq. (34). An alternative would be to estimate all three ℙ-parameters
share of the half-year interval before the nearest coupon over which the in a longer time series, and construct final estimates of all four parame-
coupon bond is held. Bond prices 𝑃𝑡𝑖,𝑜𝑏𝑠 are assumed to be observed with ters (including 𝜆) by optimal weighting of estimates from the sequential
measurement error, cross-sectional and time series regressions, following Andreasen and
Christensen (2015). Rather than pursuing this approach, we focus on
𝑃𝑡𝑖,𝑜𝑏𝑠 = 𝑃𝑡 (𝜏 𝑖 , 𝐶 𝑖 ) + 𝜀𝑖𝑡 , (32) the short time series of fitted 𝑟𝑡 for setting 𝜆, to retain the cross-sectional
with 𝑃𝑡 (⋅) the model-implied price from Eq. (31), (𝜏 𝑖 , 𝐶 𝑖 ) contractual nature of the approach.
terms for bond 𝑖, and the measurement error 𝜀𝑖𝑡 Gaussian with vari- Table 11 shows the 𝑅2𝑂𝑜𝑆 results. For comparison, statistics are
ance 𝜎𝑡2 , independently across observations 𝑖 = 1, … , 𝑁𝑡 available at 𝑡. shown both for forecasts based purely on cross-sectionally estimated ℚ-
Write 𝑃𝑡𝑜𝑏𝑠 for the 𝑁𝑡 -vector of observed coupon bond prices, Θ𝑡 for the parameters and short rates, and for the combined approach described.
ℚ-parameters of the pricing model, augmented with the short rate 𝑟𝑡 , The former are highest across all maturities, presumably due to par-
considered latent and backed out from the cross section, and 𝑃 (Θ𝑡 ) for simony. They are slightly higher than the corresponding statistics for
the 𝑁𝑡 -vector of model-implied prices. Estimation is by cross-sectional the CIR model based on the daily yield panel in Table 4 at the shorter
nonlinear regression or QML, maturities, and lower at the longer. The results suggest that the wide
( ) cross section is relatively useful for volatility forecasting at the short end
̂ 𝑁𝑡 1 𝑜𝑏𝑠 𝑜𝑏𝑠 of the curve, possibly because all instruments include nearby coupons.
Θ𝑡 = arg max − log 𝜎𝑡 −
2
(𝑃𝑡 − 𝑃 (Θ𝑡 )) (𝑃𝑡 − 𝑃 (Θ𝑡 )) .
′
(33)
Θ𝑡 2 2𝜎𝑡2 Further, at 𝜏 = 5, although 𝑅2𝑂𝑜𝑆 is slightly higher in the yield panel
than in the cross section of coupon bonds, the improvement over the
The cross-sectional estimate Θ ̂ 𝑡 does not depend on 𝜎 2 . Fig. A.2 in the
𝑡 RW is significant at 5% in the latter case, both for the ℚ-based and the
Appendix shows the evolution over time in 𝑁𝑡 , ranging from a low of combined approach, not in the former (𝑝-value of 19%). This extends
85 in the early part of the sample to a high of 280 towards the end. the relative usefulness of the forecasts from the wide cross section to all
For purposes of yield volatility forecasting, ℚ-parameters do not maturities but the longest, where it performs poorly, both in absolute
suffice, as 𝑉𝑡,ℎ𝜏 in Eq. (7) depends on ℙ-parameters through the coef-
terms, and compared to the yield panel. In sum, the approaches based
ficients 𝑏𝜏,ℎ
0
and 𝑏𝜏,ℎ
1
. For concreteness, we focus on the CIR model, i.e., on the yield panel and the cross section of coupon bonds exhibit broadly
Eq. (33) is the estimator considered by Brown and Dybvig (1986), who similar volatility forecasting performance, with a small advantage to the
examined the consistency between the cross-sectional estimate of the former at the long end, and to the latter at shorter maturities.
short rate volatility parameter 𝜎 in the square root process and a time Overall, the robustness analysis confirms that the daily yield panel
series estimate of this, based on the sample variance of changes in esti- captures the cross-sectional information adequately. Further, more of
mated short rates 𝑟𝑡 across calendar time. This comparison was relevant the future volatility information in the curve not captured by standard
because 𝜎 is common across ℚ and ℙ in the continuous-time model term structure models appears to be contained in higher-order factors
(see Appendix C.2). Instead, since we consider volatility forecasting at than in nonlinearities.
longer maturities, we estimate the market price of risk 𝜆 (cf. Table 1)
from a time series of short rates calibrated period by period in Eq. (33). 6.4. Time series extensions
Specifically, we consider an Euler discretization of the CIR process,
√ So far, the analysis has focused on simple, linear specifications for
𝑟𝑡+1 − 𝑟𝑡 = 𝜅(𝜃 − 𝑟𝑡 ) + 𝜎 𝑟𝑡 𝜀𝑡+1 , (34)
the time series models. However, the literature on modeling stock re-
following Chan et al. (1992), who focused on the ℙ-parameter estimates turn volatility is rich on extensions, e.g., to the basic HAR model in
resulting from time series analysis of Eq. (34). Given the estimated ℚ- Eq. (10). We now examine some extensions within the HAR framework.
parameters from Eq. (33), we instead express Eq. (34) as a function First, we allow for a nonlinear relation between past and future volatil-
of 𝜆. Since 𝑟𝑡+1 conditional on 𝑟𝑡 is Gaussian under the discretization, ity by specifying Eq. (10) in logarithmic terms. Each realized measure
15
Table 12 Overall, our results show that either information in the yield curve
Time series extensions. or in the volatility time series can be used to improve yield volatility
forecasting. The question arises whether forecasting performance can
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
be further enhanced by combining the two sources of information. We
Panel A: HAR extensions address this issue in the HAR framework via a time-varying parameter
extension. In particular, we allow for the HAR coefficients to depend on
Log-HAR 4.05 6.69 17.08 14.38
information from the yield curve. This notion seems natural, as interest
(0.39) (0.30) (0.02) (0.11)
rates are known to carry important economic signals. Thus, the yield
Lev-HAR -5.40 -0.64 22.01 14.35
spread is widely considered to be a recession indicator, and the inter-
(0.61) (0.52) (0.00) (0.12)
est rate level measures the distance to the ZLB. Furthermore, we have
Panel B: Interest rate HAR extensions shown that interest rate factors can forecast the volatility forecasting er-
rors from the HAR model (Section 5.2.1). This suggests that interest rate
PC1 -0.15 6.88 21.50 12.80
information can be incorporated in the HAR framework in a productive
(0.50) (0.31) (0.01) (0.09)
manner. For a simple specification, we consider in turn whether each of
PC2 0.29 7.08 25.31 11.58
the three first PCA factors from the yield curve drives variation in the
(0.49) (0.31) (0.00) (0.07) intercept in the HAR model, or in both intercept and slopes, for an inter-
PC3 -0.86 5.33 24.66 18.04 active effect between yield curve and volatility information. Here, the
(0.52) (0.35) (0.00) (0.02) first PCA factor, level, should capture distance to the ZLB, and the next
PC1 interaction 0.12 9.35 21.93 12.84 two, slope and curvature, the effect of the spread. Thus, in the general-
(0.50) (0.23) (0.00) (0.07) ized models, when Eq. (10) is estimated based on information through
PC2 interaction -4.09 10.03 26.54 11.96 𝑡′ , the extended parameter specification
(0.62) (0.22) (0.00) (0.05)
𝜏,ℎ
PC3 interaction -10.13 0.29 23.86 18.91 𝛽𝑖,𝑡 = 𝛼𝑖𝜏,ℎ + 𝛾𝑖𝜏,ℎ 𝐹̂𝑡,𝑗 (36)
(0.69) (0.49) (0.00) (0.01)
is used, with 𝐹̂𝑡,𝑗 , 𝑗 = 1, 2, 3 the fitted yield curve factors at 𝑡 ≤ 𝑡′ from es-
measures in percent relative to timation of Eq. (8) with 𝑘 = 3, using data through 𝑡′ . In the time-varying
a RW for the Log-HAR model, the Lev-HAR model, and intercept extensions, Eq. (36) is only used for 𝑖 = 0. In the interactive
the HAR model with time-varying parameters, with ei- extensions, it is used for 𝑖 = 0, 𝐷, 𝑊 , 𝑀 .
ther intercept or both intercept and slopes depending in Other approaches to time-varying parameters in the HAR frame-
an affine manner on one of the three leading principal work have been proposed in the literature, e.g., the HARQ model of
components from the daily yield panel. In parentheses Bollerslev et al. (2016), and the model of Buccheri and Corsi (2021).
asymptotic 𝑝-values for a one-sided Diebold-Mariano test
However, these models rely on the basic realized volatility framework
using the Newey-West variance estimator with automatic
and, hence, are not naturally transferred to our pre-averaging estima-
lag selection of Andrews (1991). The initial estimation pe-
riod ranges from January 2, 2000 to June 6, 2008, and the
tor setting. Therefore, we do not pursue these alternative approaches
out-of-sample period from June 7, 2008, through Decem- further.
ber 31, 2020. Panel B in Table 12 shows the 𝑅2𝑂𝑜𝑆 results from using Eq. (36)
with Eq. (10). At the longest maturities, extending the HAR model with
is replaced by its logarithm, for estimation purposes, and a Jensen’s information from the yield curve enhances volatility forecasting per-
inequality correction applied when constructing level forecasts (expo- formance, relative to the standard model from Table 4. Allowing for
nentiating the relevant fitted value plus one half times the residual interactions using level or slope, performance is improved at 𝜏 = 1, too.
variance). We label this the Log-HAR model. Second, stocks are affected For given maturity, the best volatility forecasts considered are those us-
by the so-called leverage effect. We examine whether this is also the ing the original HAR at the short end, the interactive HAR extension
case for interest rates and, in particular, whether it matters for yield with slope at the two intermediate maturities, and that with curvature
volatility forecasting, by considering the leverage HAR (or Lev-HAR) at the long end of the curve. The results suggest that recession risk is im-
model of Corsi and Renò (2012). The model allows for a leverage effect portant at longer maturities. All extensions retain significance at level
by extending Eq. (10) with the sum of past negative returns. It is given 1% of the improvement over the RW at 𝜏 = 5 and, in contrast to the
as standard HAR, deliver significant improvements at the longest matu-
rity, too, at 10%. The time-varying intercept model using curvature and
𝜏
𝑉𝑡+1→𝑡+ℎ = 𝛽0𝜏,ℎ + 𝛽𝐷
𝜏,ℎ 𝜏 𝜏,ℎ 𝜏
𝑉𝑡 + 𝛽𝑊 𝜏,ℎ 𝜏
𝑉𝑡−4→𝑡 + 𝛽𝑀 𝑉𝑡−21→𝑡 the interactive model using slope provide significant improvements at
𝜏,ℎ − 𝜏,ℎ − 𝜏,ℎ −
5%, and the interactive model using curvature at 1%. Obviously, fur-
+ 𝛾𝐷 |𝑟𝑡,𝜏,1 | + 𝛾𝑊 |𝑟𝑡,𝜏,5 | + 𝛾𝑀 |𝑟𝑡,𝜏,22 | + 𝑢𝜏,ℎ
𝑡+ℎ
, (35) ther research could investigate combinations using multiple factors, but
where 𝑟− aggregates any negative changes in the maturity 𝜏 yield the results confirm that cross-sectional yield curve information and his-
𝑡,𝜏,ℎ
over the past ℎ days. torical time series information on volatility can be fruitfully combined
Panel A in Table 12 presents the 𝑅2𝑂𝑜𝑆 results. Comparing with the for volatility forecasting purposes.
standard HAR model in Table 4, each extension generates higher 𝑅2𝑂𝑜𝑆
at two maturities (𝜏 = 0.5 and 7 for Log-HAR, 𝜏 = 5 and 7 for Lev-HAR), 7. Concluding remarks
and lower than HAR at the other two. All three models improve signif-
icantly over the RW at maturity 𝜏 = 5 and level 5%, and get 𝑝-values Our results show that the assessment of future interest rate risk is a
around 11% for the longest maturity. Thus, the evidence on the ex- complicated affair. A strong benchmark can be formed by basing fore-
tensions is mixed, and the standard HAR model does not appear to be a casts on historical volatility, with a focus on recent periods, as in the
bad choice. Regarding the leverage effect, a potential explanation is that formal time-varying volatility models. A good alternative at interme-
compared to returns, yields tend to have a truncation around zero, as diate maturities can be based on the risk premium, in particular, the
in shadow rate models (Black, 1995; Christensen and Rudebusch, 2015; forward spread, or a common factor approach. For agents with a pri-
Andreasen and Meldrum, 2019). The truncation lowers yield volatility mary interest in the long end of the curve, e.g., for immunization of
for yields near zero by construction. Thus, we do not necessarily expect assets and liabilities, valuation of capital assets, or market timing, in-
a leverage effect in interest rates. cremental information on future yield volatility relative to that in the
16
time series is available by looking across maturities along the curve, ⎧ 𝑧 if 𝑧 < 7
⎪
and more so than implied by formal term structure models. Some of 𝑣=⎨ 3 if 𝑧 ≥ 7 for 10-year note, long term bond
the curve-based information about future volatility is distinct from that ⎪𝑧 − 6 if 𝑧 ≥ 7 for 5-year note
⎩
explaining the levels of yields, and is contained in higher-order factors
rather than nonlinearities. Furthermore, our results point to the exis- with 𝑐𝑜𝑢𝑝𝑜𝑛 the annualized coupon, 𝑛 the number of (whole) years from
tence of a latent stochastic volatility factor, unspanned by yields. the first day of the delivery month to maturity, and 𝑧 the number of
Based on a simple portfolio exercise, the information about future months between 𝑛 and the maturity date, rounded to nearest quarter for
reinvestment rate risk in either the yield curve or the volatility history the 10-year note and the long term Treasury bond, and nearest month
is of economic value to a risk averse investor, both in short and long for the 5-year note. The delivery bond (or note) must have a maturity
term instruments. A simple yield spread, affine models, and time series between 4.17 and 5.25 years for the 5-year futures, between 6.5 and 10
methods are all useful, on utility grounds, relative to a random walk. years for the 10-year futures, and between 15 and 25 years for the long
Finally, we demonstrate that the cross-sectional yield curve information term Treasury bond futures.
and the historical time series information on volatility can be fruitfully Given the conversion factor 𝑓𝑡 from Eq. (A.1), the invoice amount is
combined to enhance volatility forecasting performance.
𝐼𝑡 = 𝑓𝑡 ⋅ 𝐹𝑡 + 𝑎𝑡 , (A.2)
We have emphasized from the beginning that when leveraging
volatility information drawn from the yield curve, investor should not with 𝐹𝑡 the futures price, and 𝑎𝑡 accrued interest. On delivery, the seller
ignore historical volatility in the information set. That the resulting pic- pays the basis
ture is diverse is perhaps not surprising. Investor is, after all, gazing
into the crystal ball. Clearly, mimicking this situation requires a recur- 𝜋𝑡 = 𝑆𝑡 − 𝐼𝑡 , (A.3)
sive, out-of-sample approach, to avoid admitting the artificial investor
with 𝑆𝑡 the cash bond price (including accrued interest), and therefore
the benefit of hindsight.
selects the cheapest-to-deliver (CTD) bond fulfilling the futures contract
specifications and minimizing Eq. (A.3) (maximizing the implied repo
CRediT authorship contribution statement
rate).
The second option regards the delivery date. For the purpose of
All authors have contributed equally to all parts of the article. No
backing out the futures-implied bond price, we assume that both the
other authors have been involved.
CTD bond and delivery date are known. We set the delivery date to the
first working day of the delivery month, and use the CRSP data on Trea-
Declaration of competing interest suries to construct the delivery basket. The price of the futures is then
given as
None.
𝐼𝑡 = (𝑆𝑡 − 𝐶𝑡 )𝑒𝑟𝑡 𝑇 , (A.4)
Data availability
with 𝑆𝑡 the spot price of the CTD bond (including accrued interest), 𝐶𝑡
Data will be made available on request. the present value of coupon payments before delivery, 𝑇 the time to
delivery, and 𝑟𝑡 the risk-free rate. For the latter, we use the 3-month
1∕4
Appendix A. From futures prices to yield curves yield from the daily panel, 𝑟𝑡 = 𝑦𝑡 (Section 3.5). The 3-month rate
should be more market-based than, say, a 1-month rate, and a good
A.1. From futures prices to coupon bond prices short rate proxy, cf. Chapman et al. (1999).
In brief, the recipe for constructing the futures-implied bond price
𝑆𝑡 is as follows:
The seller (the short) has two options included in the Treasury fu-
tures. The first regards which bond to deliver, and the second the
(i) Construct the delivery basket from the CRSP data on the first trad-
delivery date within the delivery month. For the first option, the un-
ing day of the delivery month.
derlying assets of the Treasury futures (see Section 3) are hypothetical
(ii) Find the cheapest-to-deliver bond minimizing the basis 𝜋𝑡 from
bonds with a notional yield of 6% throughout our sample period. The
Eq. (A.3) within the delivery basket from (i).
seller must deliver an actual bond, selected from a delivery basket con-
(iii) Use Eq. (A.2) to calculate the invoice amount 𝐼𝑡 for the CTD bond
structed for each futures contract according to CME requirements, with
from (ii).
a conversion factor for each bond. The conversion factor converts the
(iv) Substitute the invoice amount from (iii) for 𝐼𝑡 in Eq. (A.4), and
bond price into “. . . the approximately decimal price at which $1 par
back out the futures-implied bond price by isolating 𝑆𝑡 .
of the security would trade as if it had a 6% yield-to-maturity.”16 The
formula is
A.2. From coupon bond prices to yield curves
( 𝑐𝑜𝑢𝑝𝑜𝑛 )
𝑓 =𝑎 +𝑐+𝑑 −𝑏 (A.1)
2 A variety of methods is available for extracting the yield curve from
( )𝑣
1 6 the cross section of coupon bond prices. In the literature on yield curves
𝑎=
1.03 at high frequency, some version of cubic spline is usually employed,
𝑐𝑜𝑢𝑝𝑜𝑛 6 − 𝑣 e.g., Andersen and Benzoni (2010) and Cieslak and Povala (2016), due
𝑏=
2 6 to the small number of observations in the cross section, and the need
⎧ ( 1 )2𝑛 for flexibility to fit the curve reasonably. As a consequence, interpola-
⎪ if 𝑧 < 7
𝑐 = ⎨ ( 1.03)2𝑛+1 tion between points of principal payments is difficult. Following Cieslak
⎪ 1
otherwise and Povala (2016), we use the method of Fisher et al. (1995). This as-
⎩ 1.03
sumes that the discount function 𝐵𝑡𝜏 = exp(−𝜏𝑦𝜏𝑡 ) is determined by a
𝑐𝑜𝑢𝑝𝑜𝑛
𝑑= (1 − 𝑐) cubic spline ℎ( ⋅ , Ψ), i.e., 𝜏𝑦𝜏𝑡 = ℎ(𝜏, Ψ), with Ψ a set of parameters (sup-
0.06
pressing dependence on 𝑡). The yield curve is split along the maturity
axis using 𝐾 knot points, 0 < 𝜏1 < … < 𝜏𝐾 , with 𝜏𝐾 the maximum ma-
16 https://www.cmegroup.com/trading/interest-rates/calculating-us- turity of the bonds considered. The cubic spline is restricted such that
treasury-futures-conversion-factors.html. ℎ( ⋅ , Ψ) and its two first derivatives are continuous. This implies that one
17
Table A.1 Table A.2

Correlation between futures-implied and Liquidity of Treasury futures.
Gürkaynak et al. (2007) yields.
1-min 5-min 10-min
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
Full sample
99.99% 99.99% 99.62% 99.37%
5-year note 86.45% 97.10% 98.76%
This table shows the sample correlation 10-year note 95.10% 99.14% 99.30%
between end-of-day futures-implied yields Long term bond 95.41% 99.27% 99.37%
and yields constructed using Gürkaynak et
al. (2007) parameter estimates. The sam- Trading days with no activity for
ple spans the period from January 2, 2000, more than an hour discarded
through December 31, 2020.
5-year note 87.10% 97.64% 99.27%
10-year note 95.69% 99.67% 99.85%
parameter is added for each additional knot point. Fisher et al. (1995) Long term bond 95.97% 99.77% 99.86%
and Dai et al. (2007) set 𝐾 to approximately one third of the num-
ber of bonds included, while Andersen and Benzoni (2010) and Cieslak This table shows the fraction of high-frequency inter-
and Povala (2016) set 𝐾 equal to the number of bonds. We follow the vals containing a Treasury futures price observation,
by underlying maturity and sampling frequency. The
high-frequency literature and set the number of knot points equal to the
full sample is included in the upper panel. In the lower
number of bonds on which we calibrate the yield curve, i.e., 𝐾 = 5. We
panel, days with no trading activity for more than one
let {𝑠𝑘 }𝐾
𝑘=1
denote the set of knot points.
hour are discarded. The sample spans the period from
To calibrate ℎ(𝜏, Ψ), we follow Fisher et al. (1995) and use the simple January 2, 2000, through December 31, 2020.
parametrization of the cubic B-spline basis. Any cubic spline can be
constructed as a linear combination of B-splines, ℎ(𝜏, Ψ) = 𝜙(𝜏)Ψ, where
Ψ is a 𝜅 × 1 vector of coefficients, and 𝜙(⋅) is a cubic B-spline basis, i.e., of intervals containing a futures price observation. There is a jump be-
a vector of 𝜅 = 𝐾 + 2 cubic B-splines satisfying tween 2003 and 2004, due to the introduction of the electronic trading
pit in 2004, and clearly fewer intervals containing futures price obser-
𝜏 − 𝑑𝑘 𝑑𝑘+𝑟 − 𝜏
𝜙𝑟𝑘 (𝜏) = 𝜙𝑟−1
𝑘 (𝜏) − 𝜙𝑟−1
𝑘+1
(𝜏) . vations at the 1 minute frequency than at 5 and 10 minutes, at least
𝑑𝑘+𝑟−1 − 𝑑𝑘 𝑑𝑘+𝑟 − 𝑑𝑘+1 until 2008. This suggests that the high-frequency Treasury futures data
Here, 𝑟 = 4 for a cubic spline, and 𝑑𝑘 is an augmented set of knot points, are likely afflicted with market microstructure noise.
𝑑1 = 𝑑2 = 𝑑3 = 𝑠1 , 𝑑𝑘+3 = 𝑠𝑘 for 1 ≤ 𝑘 ≤ 𝜅, and 𝑑𝜅+4 = 𝑑𝜅+5 = 𝑑𝜅+6 = 𝑠𝜅 .
The price of a coupon bond is written as Appendix C. Models
𝐽
∑
𝑃 (Ψ) = 𝐶𝑗 exp(−ℎ(𝑡𝑗 , Ψ)) , C.1. The Vasicek model
𝑗=1
with 𝐽 the number of payments (coupons and principal) over the re- The model of Vasicek (1977) is an 𝐴0 (1) model in which the dynam-
maining lifetime of the bond, and 𝐶𝑗 the 𝑗 th payment, due 𝑡𝑗 periods ics of the short rate 𝑟𝑡 = 𝑦0𝑡 are given by
hence. The yield curve is estimated using penalized nonlinear least
𝑑𝑟𝑡 = 𝜅(𝜃 − 𝑟𝑡 )𝑑𝑡 + 𝜎𝑑𝑊𝑡 .
squares (PNLS),
𝜏𝐾 We assume that the market price of risk is completely affine, i.e., 𝜆𝑡 = 𝜆.
𝐾
∑
̂ = arg min Thus,
Ψ (𝑃𝑖 − 𝑃̂𝑖 (Ψ))2 + 𝜆 ℎ (𝑠) 𝑑𝑠 ,
′′ 2
Ψ ∫
𝑖=1 𝜎𝜆
0 𝜃̃ = 𝜃 − , 𝜅̃ = 𝜅 .
𝜅
where the penalty term with coefficient 𝜆 induces smoothness of the
calibrated yield curve. We find that this procedure results in a better The solution to the Riccati equations (6) is given by
fit to bond prices than the Nelson and Siegel (1987) approach used in 1
Faust et al. (2007). 𝐵(𝜏) = (1 − 𝑒−𝜅𝜏̃
),
𝜅̃
Upon estimation, four points are read off the resulting high- ( )
𝜎 2 𝜎2
frequency (1 minute) yield curves, at 𝜏 = 0.5, 1, 5, and 7 years. 𝐴(𝜏) = 𝜃̃ − (𝜏 − 𝐵(𝜏)) + 𝐵(𝜏)2 .
2𝜅̃ 2 4𝜅̃
Appendix B. Liquidity of Treasury futures The conditional variance of the future short rate is
𝜎2
𝑉 𝑎𝑟𝑡 (𝑟𝑡+ℎ ) = (1 − 𝑒−2𝜅ℎ ) .
Table A.2 shows the percentage of high-frequency intervals contain- 2𝜅
ing a Treasury futures price observation, and hence affording unique
yield identification, at the 1, 5, and 10 minute sampling frequencies, C.2. The Cox-Ingersoll-Ross model
by maturity. All three maturities, 5 and 10 years, and long term, are
included in the table, because all three Treasury futures are used in The model of Cox et al. (1985) is an 𝐴1 (1) model with
curve fitting, although only implied yields at maturities up to 𝜏 = 7 √
are used in the subsequent analysis (see Section 3.3 and Appendix A). 𝑑𝑟𝑡 = 𝜅(𝜃 − 𝑟𝑡 )𝑑𝑡 + 𝜎 𝑟𝑡 𝑑𝑊𝑡 .
From the table, around 86% to 95% of the 1 minute intervals con- We adopt the completely affine market price of risk specification, 𝜆𝑡 =
tain an observation. This increases to around 97% and 99% for 5 and √
𝜆 𝑟𝑡 ∕𝜎. Thus,
10 minute intervals, respectively. The lower panel of the table shows
𝜅𝜃
the corresponding percentages when discarding trading days with no 𝜅̃ = 𝜅 + 𝜆, 𝜃̃ = .
trading activity for more than an hour. All numbers increase by around 𝜅 +𝜆
0.5-1.0%. This suggests that the discarded days do not contain many The solution to the Riccati equations (6) is given by
uniquely identified observations and, hence, it makes sense to discard 2(𝑒𝛾𝜏 − 1)
them. Fig. A.1 shows the evolution over time in the daily percentage 𝐵(𝜏) = ,
̃ 𝛾𝜏 − 1) + 2𝛾
(𝛾 + 𝜅)(𝑒
18
Fig. A.1. Liquidity of Treasury futures over time.

This figure shows the evolution in the daily fraction of high-frequency intervals containing a Treasury futures price observation, by underlying maturity and sampling
frequency. The sample spans the period from January 2, 2000, through December 31, 2020.
[ ]
2𝜅̃ 𝜃̃ 1
𝐴(𝜏) = − ̃ 𝛾𝜏 − 1) + 2𝛾) ,
log(2𝛾) + (𝜅̃ + 𝛾) − log ((𝛾 + 𝜅)(𝑒 Nelson and Siegel (1987) parametrization by assuming that the market
𝜎 2 2
price of risk is essentially affine and restricting
with 𝛾 given by
√
𝛾= 𝜅̃ 2 + 2𝜎 2 . ⎛0 0 0⎞
The conditional variance of the short rate is 𝜅̃ = ⎜ 0 −𝜆 0⎟ ,
⎜ ⎟
⎝0 𝜆 𝜆⎠
𝜎 2 𝑟𝑡 −𝜅ℎ 𝜎2𝜃
𝑉 𝑎𝑟𝑡 (𝑟𝑡+ℎ ) = (𝑒 − 𝑒−2𝜅ℎ ) + (1 − 𝑒−𝜅ℎ ) .
𝜅 2𝜅
𝜃̃ = 03×1 .
C.3. The arbitrage-free Nelson-Siegel model with deterministic volatility
The yield curve is then
The AFNS0 model of Christensen et al. (2011) is an 𝐴0 (3) model. The

( )
real-world dynamics of the state variables are given by 1 − 𝑒−𝜆𝜏 2 1 − 𝑒−𝜆𝜏 𝐴(𝜏)
𝑦𝜏𝑡 = 𝑋𝑡1 + 𝑋𝑡 + − 𝑒−𝜆𝜏 𝑋𝑡3 − . (C.5)
𝜆𝜏 𝜆𝜏 𝜏
𝑑𝑋𝑡 = 𝜅(𝜃 − 𝑋𝑡 )𝑑𝑡 + Σ𝑑𝑊𝑡 , This corresponds to the Nelson and Siegel (1987) shape, except for
where 𝑋𝑡 and 𝜃 are 3 × 1 vectors, and 𝜅 and Σ are 3 × 3 matrices. The the term 𝐴(𝜏)
𝜏
, for which the closed form is given in Christensen et al.
model is constructed to make the shape of the yield curve resemble the (2011).
19
Table A.3
Out-of-sample 𝑅2 for two months ahead yield volatility forecasting.
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
Term structure models
Vasicek 17.09 19.25 -35.37 -131.46

(0.25) (0.21) (0.93) (1.00)
CIR -11.77 -10.38 -23.05 -143.81
(0.66) (0.65) (0.84) (0.95)
AFNS0 -13.57 -33.48 6.01 -67.70
(0.71) (0.94) (0.38) (0.98)
AFNS3 2.28 -4.91 -26.45 -135.69
(0.45) (0.56) (0.86) (1.00)
PCA based forecasters
PCA3 -2.83 -12.78 -12.39 -95.14

(0.57) (0.69) (0.75) (1.00)
PCA4 -5.65 -14.08 -16.98 -99.65
(0.64) (0.72) (0.82) (1.00) Fig. A.2. Number of coupon bonds over time.
PCA5 -7.76 -14.21 -26.33 -128.23 This figure shows the evolution in the number 𝑁𝑡 of coupon bonds with matu-
(0.69) (0.72) (0.85) (1.00)
rity between 3 months and 10 years used in the daily cross-sectional estimations
Eq. (33). Callable issues are excluded. The sample spans the period from Jan-
PCA6 -19.98 -25.55 -28.44 -143.78
uary 2, 2000, through December 31, 2020.
(0.86) (0.88) (0.87) (1.00)
1 2
Risk premium based forecasters 𝑑𝐵1 (𝜏) = 1 − 𝜀𝐵1 (𝜏) − 𝜎1,1 ,
2
Forward spreads 7.32 -1.87 3.04 -68.18 1 2
𝑑𝐵2 (𝜏) = 1 − 𝜆𝐵2 (𝜏) − 𝜎2,2 ,
(0.31) (0.53) (0.44) (0.97) 2
Yield spreads 7.52 -0.22 -28.13 -60.01 1 2
𝑑𝐵3 (𝜏) = 𝜆𝐵2 (𝜏) − 𝜆𝐵3 (𝜏) − 𝜎3,3 ,
(0.30) (0.50) (0.88) (0.97) 2
′ ̃
𝑑𝐴(𝜏) = 𝐵(𝜏) 𝜅̃ 𝜃 .
Common factor based forecasters
We solve the ODEs numerically for each trial parameter vector in the
Initial 2.07 -9.70 -24.39 -115.71
iterative estimation procedure. For 𝜀 → 0, the yield curve converges to
(0.45) (0.64) (0.84) (1.00) Nelson and Siegel (1987) shape, except for the modification by 𝐴(𝜏) 𝜏
,
Recursive -1.86 -14.47 -17.27 -127.99
as in Eq. (C.5). Following Christensen et al. (2010), we set 𝜀 = 10−6 ,
(0.55) (0.71) (0.74) (0.99) restrict 𝜅 to be diagonal, and adopt the extended affine specification
Time series models for the market price of risk. To ensure that the factors stay positive, we
impose the Feller condition under both probability measures.
HAR 26.48 22.79 6.13 -9.99
(0.08) (0.19) (0.28) (0.72) Appendix D. Estimation procedure
Mean-reverting realized GARCH 11.87 4.89 -31.63 -88.66
(0.22) (0.42) (0.92) (1.00) The term structure models are estimated using the Kalman fil-
measures in percent relative to a RW for all forecast-
ter, following Duffee (2002), Christensen et al. (2011), and others.
ing methods and maturities. In parentheses asymptotic 𝑝-values for a one-sided Let 𝑦𝑡 , 𝑡 = 1, … , 𝑇 , be the 𝑁 -vector of observed yields, with 𝑡 count-
Diebold-Mariano test using the Newey-West variance estimator with automatic ing the time increments between observations of Δ = 1∕250 years,
( )
𝐴(𝜏1 ) 𝐴(𝜏 ) ′ 𝐵(𝜏𝑖 )
lag selection of Andrews (1991). For common factor based forecasters, the la- 𝐴̃ = 𝜏1
, … , 𝑁 , and 𝐵̃ the 𝑑 × 𝑁 matrix with columns
𝜏𝑁
. By 𝜏𝑖
bel “Initial” indicates that the selection of PCA factors is based on the initial
Eqs. (18) and (19), the state space system is thus given by the measure-
estimation period, and “Recursive” that it is updated every period. The initial
ment and transition equations
estimation period ranges from January 2, 2000, through June 6, 2008, and the
out-of-sample period from June 7, 2008, through December 31, 2020.
𝑦𝑡 = 𝐴̃ + 𝐵̃ ′ 𝑋𝑡 + 𝜀𝑡 , (D.6)
𝑋𝑡 = 𝐶Δ + 𝐷Δ
′
𝑋𝑡−1 + 𝜂𝑡 , (D.7)
C.4. The arbitrage-free Nelson-Siegel model with stochastic volatility
where
( ) [( ) ( )]
The AFNS3 stochastic (or state-dependent) volatility model of Chris- 𝜀𝑡 0 𝐻Δ 0
∼N , .
tensen et al. (2010) is an 𝐴3 (3) model. Under ℚ, the dynamics of the 𝜂𝑡 0 0 𝑄𝑡,Δ
state variables are restricted to
The conditional expectation and variance of the state process are given
⎛𝜀 0 0 ⎞ ⎡⎛ 𝜃̃1 ⎞ ⎤ by
𝑑𝑋𝑡 = ⎜ 0 𝜆 −𝜆 ⎟ ⎢⎜ 𝜃̃2 ⎟ − 𝑋𝑡 ⎥ 𝑑𝑡
⎜ ⎟ ⎢⎜ ⎟ ⎥
𝜆 ⎠ ⎣⎝ 𝜃̃3 ⎠ 𝔼𝑡−1 (𝑋𝑡 ) =(𝐼 − exp(−𝜅Δ))𝜃 + exp(−𝜅Δ)𝑋𝑡−1 = 𝐶Δ + 𝐷Δ 𝑋𝑡−1 ,
′
⎝0 0 ⎦ (D.8)
√ 𝑡
⎛ 𝜎1,1 0 0 ⎞ ⎛ 𝑋1,𝑡
√
0 0 ⎞
+⎜ 0 𝜎2,2 0 ⎟⎜ 0 𝑋2,𝑡 0 ⎟ 𝑑𝑊𝑡ℚ . 𝑉 𝑎𝑟𝑡−1 (𝑋𝑡 ) = exp(−𝜅(𝑡 − 𝑢))𝜎(𝔼𝑡−1 (𝑋𝑢 ))𝜎(𝔼𝑡−1 (𝑋𝑢 ))′
⎜ ⎟⎜ √ ⎟ ∫
⎝ 0 0 𝜎3,3 ⎠ ⎝ 0 0 𝑋3,𝑡 ⎠ 𝑡−1
The Riccati equations (6) are then given by exp(−𝜅(𝑡 − 𝑢))′ 𝑑𝑢 = 𝑄𝑡,Δ , (D.9)
20
√
where exp(⋅) is the matrix exponential, and 𝜎(⋅) = Σ 𝑆(⋅), with 𝑆(⋅) information set, thereby conditioning on a larger information set than
√
from Eq. (3). For example, 𝜎(𝑋𝑢 ) = diag(𝜎𝑖,𝑖 𝑋𝑖,𝑢 ) in the model in that based on daily yields in the Kalman filter.
Appendix C.4. By Eq. (D.8), the transition parameters in Eqs. (19)
and (D.7) are 𝐶Δ = (𝐼𝑑 − exp(−𝜅Δ))𝜃 and 𝐷Δ = exp(−𝜅Δ). Write 𝑌𝑡 = D.2. Conditional moments of state process
(𝑦1 , … , 𝑦𝑡 ) for the data through 𝑡 and define 𝑋𝑡∣𝑡−1 = 𝔼(𝑋𝑡 ∣ 𝑌𝑡−1 ), Σ𝑡∣𝑡−1 =
𝑉 𝑎𝑟(𝑋𝑡 ∣ 𝑌𝑡−1 ), 𝑋𝑡∣𝑡 = 𝔼(𝑋𝑡 ∣ 𝑌𝑡 ), and Σ𝑡∣𝑡 = 𝑉 𝑎𝑟(𝑋𝑡 ∣ 𝑌𝑡 ). Suppressing Δ for By the results in Fackler (2000), also used in Jacobs and Karoui
notational ease, the prediction step is then (2009), the first two conditional moments can be written as
𝑋𝑡∣𝑡−1 = 𝐶 + 𝐷′ 𝑋𝑡−1∣𝑡−1 , (D.10) 𝔼𝑡 (𝑋𝑡+ℎ ) = 𝐶ℎ + 𝐷ℎ′ 𝑋𝑡 , (D.18)

Σ𝑡∣𝑡−1 = 𝐷′ Σ𝑡−1∣𝑡−1 𝐷 + 𝑄𝑡 , (D.11) 𝑣𝑒𝑐(𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ )) = 𝜁0,ℎ + 𝜁1,ℎ
′
𝑋𝑡 , (D.19)
where 𝑄𝑡 is the conditional variance of the state variables from with 𝐶ℎ = (𝐼𝑑 − exp(−𝜅ℎ))𝜃 and 𝐷ℎ = exp(−𝜅ℎ), see Eqs. (D.8)-(D.9). The
Eq. (D.9), while 𝐷′ Σ𝑡−1∣𝑡−1 𝐷 reflects conditional mean variation, cf. 𝑑 2 × 1 vector of intercepts 𝜁0,ℎ in Eq. (D.19) is determined by
footnote 3. The update step is ( )
𝐶ℎ
̃ −1 𝑣𝑡 , = (𝐼𝑑+𝑑 2 − exp(Πℎ))Π−1 Θ , (D.20)
𝑋𝑡∣𝑡 = 𝑋𝑡∣𝑡−1 + Σ𝑡∣𝑡−1 𝐵𝐹 𝑡 (D.12) 𝜁0,ℎ
̃ −1 𝐵̃ ′ Σ𝑡∣𝑡−1 ,
Σ𝑡∣𝑡 = Σ𝑡∣𝑡−1 − Σ𝑡∣𝑡−1 𝐵𝐹 (D.13) and the 𝑑 × 𝑑 2 matrix of slope terms 𝜁1,ℎ by
𝑡
with the one step ahead prediction error and its variance given by ( ) [ ]
𝐷ℎ 𝐼
= exp(−Πℎ) 𝑑 . (D.21)
𝜁1,ℎ
′
0
𝑣𝑡 = 𝑦𝑡 − 𝐴̃ − 𝐵̃ ′ 𝑋𝑡∣𝑡−1 , (D.14)
𝐹𝑡 = 𝑉 𝑎𝑟(𝑣𝑡 ) = 𝐵̃ ′ Σ𝑡∣𝑡−1 𝐵̃ + 𝐻 . (D.15) In Eq. (D.20), the (𝑑 + 𝑑 2 )-vector Θ is

[ ]
The filter is initiated by setting 𝑋0∣0 and Σ0∣0 equal to the uncondi- 𝜅𝜃
Θ= ,
tional expected value and variance. From Eq. (2), 𝑋0∣0 = 𝜃, and from (Σ ⊗ Σ)D𝛼
Eq. (D.7), Σ0∣0 = 𝐷′ Σ0∣0 𝐷 + 𝑄, with 𝑄 from Eq. (D.9) at 𝔼𝑡−1 (𝑋𝑢 ) = 𝜃. with 𝛼 = 0 in Eq. (3) for the models we consider, and D given by
( )−1
Thus, 𝑣𝑒𝑐(Σ0∣0 ) = 𝐼𝑑 2 − 𝐷′ ⊗ 𝐷′ 𝑣𝑒𝑐(𝑄). {
Given the Gaussianity assumptions on measurement and transition 1 if 𝑖 = (𝑗 − 1)𝑑 + 𝑗 ,
D𝑖,𝑗 =
equations, the parameters 𝜓 are estimated by maximizing the prediction 0 otherwise .
error decomposition of the conditional log likelihood function, Further,
𝑇 (
[ ]
∑ ( ) 1 ) 𝜅 0
1 1 Π= ,
𝐿(𝜓) = − 𝑁 log(2𝜋) − log det 𝐹𝑡 − 𝑣′𝑡 𝐹𝑡−1 𝑣𝑡 . (D.16) −(Σ ⊗ Σ)DB 𝜅 ⊗ 𝐼𝑑 + 𝐼𝑑 ⊗ 𝜅
𝑡=1
2 2 2
The states 𝑋𝑡 are Gaussian for the Vasicek and AFNS0 models. For the a (𝑑 + 𝑑 2 ) × (𝑑 + 𝑑 2 ) matrix, with the 𝑖th row of B given by 𝛽𝑖′ from Eq. (3),
square root processes (the CIR and AFNS3 models), estimation based on and rank B = 𝑚 ≤ 𝑑.
Eq. (D.16) amounts to QML. Here, the Gaussian approximation implies In the special case 𝑚 = 0, i.e., an 𝐴0 (𝑑) model, we have B = 0 and,
that states are not restricted to be positive. In this case, following Chen hence,
[ ]
and Scott (2003), we truncate states at 0. Given the large number of 𝜅 0
Π= .
parameters in some of the models, we use the global optimizer differen- 0 𝜅 ⊗ 𝐼𝑑 + 𝐼𝑑 ⊗ 𝜅
tial evolution with several starting values. For every 250 observations,
From Eq. (D.21),
we reestimate parameters using differential evolution. In intermediate
( ) ( )
periods we use local optimizers. 𝐷ℎ exp(−𝜅ℎ)
= (D.22)
𝜁1,ℎ
′
0
D.1. Predictive regression correction
in this case. In particular,
From Eq. (D.6), 𝑣𝑎𝑟(𝑦𝑡 ∣ 𝑋𝑡−1 ) = 𝐵̃ ′ 𝑉 𝑎𝑟𝑡−1 (𝑋𝑡 )𝐵̃ + 𝐻 , suppressing de-
pendence on the time increment Δ. Using this and Eqs. (D.9), (D.11) 𝜁1,ℎ = 0 , for 𝑚 = 0 . (D.23)
and (D.15),
D.3. Conditional moments of yields
𝑉 𝑎𝑟(𝑦𝑡 ∣ 𝑌𝑡−1 ) = 𝑉 𝑎𝑟(𝑣𝑡 )
By Eqs. (5) and (D.18), the conditional means of the yields are
= 𝐵̃ ′ (𝐷′ Σ𝑡−1∣𝑡−1 𝐷 + 𝑉 𝑎𝑟𝑡−1 (𝑋𝑡 ))𝐵̃ + 𝐻
( )
= 𝐵̃ ′ 𝐷′ Σ𝑡−1∣𝑡−1 𝐷𝐵̃ + 𝑉 𝑎𝑟(𝑦𝑡 ∣ 𝑋𝑡−1 ) . (D.17) 𝔼𝑡 (𝑦𝑡+ℎ ) = 𝐴̃ + 𝐵̃ ′ 𝐶ℎ + 𝐷ℎ′ 𝑋𝑡 . (D.24)
Thus, the conditional variance of yields is given by that corresponding Using Eqs. (7) and (D.19), the conditional yield variances are
to perfect observation of state variables through 𝑡 − 1, i.e., the second 1
term in Eq. (D.17), with an adjustment for conditional mean variation 𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) = 𝐵(𝜏)′ 𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ )𝐵(𝜏)
𝜏2
due to imperfect state observation given by the first term in Eq. (D.17) 1
(see footnote 3). When forecasting using the predictive regression in = (𝐵(𝜏) ⊗ 𝐵(𝜏))′ 𝑣𝑒𝑐(𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ ))
𝜏2
Eq. (1), with 𝑍𝑡 given by Eq. (7), then 𝑍𝑡 corresponds to the second
= 𝑏𝜏,ℎ + 𝑏𝜏,ℎ
′
term in Eq. (D.17). Forecasting therefore involves three modifications. 0 1
𝑋𝑡 , (D.25)
First, the filtered state 𝑋𝑡∣𝑡 is used in place of 𝑋𝑡 in Eq. (7). Second, 1
𝑏𝜏,ℎ = (𝐵(𝜏) ⊗ 𝐵(𝜏))′ 𝜁0,ℎ , (D.26)
using Eq. (1), 𝜌𝜏,ℎ and 𝜌𝜏,ℎ provide empirical predictive regression cor-
0 𝜏2
0 1
𝜏,ℎ′ 1
rections for the bias stemming from conditional mean variation, i.e., 𝑏1 = 2 (𝐵(𝜏) ⊗ 𝐵(𝜏))′ 𝜁1,ℎ
′
, (D.27)
the first term in Eq. (D.17). Finally, the estimated coefficients in Eq. (1) 𝜏
reflect the history of realized volatilities, which are part of investor’s where 𝑏𝜏,ℎ
0
is a scalar, and 𝑏𝜏,ℎ
1
a 𝑑 × 1 vector.
21
In the special case 𝑚 = 0, i.e., an 𝐴0 (𝑑) (Gaussian) model, such as Corsi, F., 2009. A simple approximate long-memory model of realized volatility. J. Financ.
Vasicek or AFNS0 , we have from Eq. (D.23) that 𝜁1,ℎ = 0. Thus, by Econom. 7 (2), 174–196.
Corsi, F., Renò, R., 2012. Discrete-time volatility forecasting with persistent leverage ef-
Eq. (D.27), it follows that 𝑏𝜏,ℎ′
1
= 0 and, by Eq. (D.25), fect and the link with continuous-time volatility modeling. J. Bus. Econ. Stat. 30 (3),
368–380.
𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) = 𝑏𝜏,ℎ
0
, for 𝑚 = 0 . (D.28) Cox, J.C., Ingersoll Jr, J.E., Ross, S.A., 1985. A theory of the term structure of interest
rates. Econometrica 53 (2), 385–408.
References Dai, Q., Singleton, K.J., 2000. Specification analysis of affine term structure models. J.
Finance 55 (5), 1943–1978.
Dai, Q., Singleton, K.J., Yang, W., 2007. Regime shifts in a dynamic term structure model
Andersen, T.G., Benzoni, L., 2010. Do bonds span volatility risk in the US Treasury mar-
of US Treasury bond yields. Rev. Financ. Stud. 20 (5), 1669–1706.
ket? A specification test for affine term structure models. J. Finance 65 (2), 603–653.
Diebold, F.X., Mariano, R.S., 1995. Comparing predictive accuracy. J. Bus. Econ. Stat. 13,
Andersen, T.G., Bollerslev, T., Christoffersen, P.F., Diebold, F.X., 2006. Volatility and
253–263.
correlation forecasting. In: Elliot, G., Granger, C., Timmermann, A. (Eds.), Handbook
Duffee, G.R., 2002. Term premia and interest rate forecasts in affine models. J. Finance 57
of Economic Forecasting. North-Holland, Amsterdam, pp. 778–878.
(1), 405–443.
Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting
Duffie, D., Kan, R., 1996. A yield-factor model of interest rates. Math. Finance 6 (4),
realized volatility. Econometrica 71 (2), 579–625.
379–406.
Andreasen, M.M., Christensen, B.J., 2015. The SR approach: a new estimation procedure
Dumas, B., Fleming, J., Whaley, R.E., 1998. Implied volatility functions: empirical tests.
for non-linear and non-Gaussian dynamic term structure models. J. Econom. 184 (2),
J. Finance 53 (6), 2059–2106.
420–451.
Fackler, P., 2000. Moments of affine diffusions. Working paper. North Carolina State
Andreasen, M.M., Meldrum, A., 2019. A shadow rate or a quadratic policy rule? The best
University.
way to enforce the zero lower bound in the United States. J. Financ. Quant. Anal. 54
Fama, E.F., Bliss, R.R., 1987. The information in long-maturity forward rates. Am. Econ.
(5), 2261–2292.
Rev. 77 (4), 680–692.
Andrews, D.W., 1991. Heteroskedasticity and autocorrelation consistent covariance ma-
Faust, J., Rogers, J.H., Wang, S.-Y.B., Wright, J.H., 2007. The high-frequency response
trix estimation. Econometrica 59 (3), 817–858.
of exchange rates and interest rates to macroeconomic announcements. J. Monet.
Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models.
Econ. 54 (4), 1051–1068.
Econometrica 70 (1), 191–221.
Feldhütter, P., Heyerdahl-Larsen, C., Illeditsch, P., 2016. Risk premia and volatilities in a
Bakshi, G., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing
nonlinear term structure model. Rev. Finance 22 (1), 337–380.
models. J. Finance 52 (5), 2003–2049.
Filipović, D., Larsson, M., Trolle, A.B., 2017. Linear-rational term structure models. J.
Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008. Designing realized
Finance 72 (2), 655–704.
kernels to measure the ex post variation of equity prices in the presence of noise.
Fisher, M., Nychka, D.W., Zervos, D., 1995. Fitting the term structure of interest rates
Econometrica 76 (6), 1481–1536.
with smoothing splines. Working paper. Federal Reserve Board.
Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realized volatility
Gargano, A., Pettenuzzo, D., Timmermann, A., 2017. Bond return predictability: economic
and its use in estimating stochastic volatility models. J. R. Stat. Soc., Ser. B, Stat.
value and links to the macroeconomy. Manag. Sci. 65 (2), 508–540.
Methodol. 64 (2), 253–280.
Gürkaynak, R.S., Sack, B., Wright, J.H., 2007. The US Treasury yield curve: 1961 to the
Bikbov, R., Chernov, M., 2009. Unspanned stochastic volatility in affine models: evidence
present. J. Monet. Econ. 54 (8), 2291–2304.
from Eurodollar futures and options. Manag. Sci. 55 (8), 1292–1305.
Hansen, P.R., Huang, Z., Shek, H.H., 2012. Realized GARCH: a joint model for returns
Black, F., 1995. Interest rates as options. J. Finance 50 (5), 1371–1376.
and realized measures of volatility. J. Appl. Econom. 27 (6), 877–906.
Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. J. Polit.
Heath, D., Jarrow, R., Morton, A., 1992. Bond pricing and the term structure of inter-
Econ. 81 (3), 637–654.
est rates: a new methodology for contingent claims valuation. Econometrica 60 (1),
Bollerslev, T., Hood, B., Huss, J., Pedersen, L.H., 2018. Risk everywhere: modeling and
77–105.
managing volatility. Rev. Financ. Stud. 31 (7), 2729–2773.
Jacobs, K., Karoui, L., 2009. Conditional volatility in affine term-structure models: evi-
Bollerslev, T., Patton, A.J., Quaedvlieg, R., 2016. Exploiting the errors: a simple approach
dence from Treasury and swap markets. J. Financ. Econ. 91 (3), 288–318.
for improved volatility forecasting. J. Econom. 192 (1), 1–18.
Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise
Brown, S.J., Dybvig, P.H., 1986. The empirical implications of the Cox, Ingersoll, Ross
in the continuous case: the pre-averaging approach. Stoch. Process. Appl. 119 (7),
theory of the term structure of interest rates. J. Finance 41 (3), 617–630.
2249–2276.
Buccheri, G., Corsi, F., 2021. Hark the shark: realized volatility modeling with measure-
Joslin, S., 2017. Can unspanned stochastic volatility models explain the cross section of
ment errors and nonlinear dependencies. J. Financ. Econom. 19 (4), 614–649.
bond volatilities? Manag. Sci. 64 (4), 1707–1726.
Buraschi, A., Corielli, F., 2005. Risk management implications of time-inconsistency:
Joslin, S., Konchitchki, Y., 2018. Interest rate volatility, the yield curve, and the macroe-
model updating and recalibration of no-arbitrage models. J. Bank. Finance 29 (11),
conomy. J. Financ. Econ. 128 (2), 344–362.
2883–2907.
Litterman, R., Scheinkman, J., 1991. Common factors affecting bond returns. J. Fixed
Campbell, J.Y., Shiller, R.J., 1991. Yield spreads and interest rate movements: a bird’s eye
Income 1 (1), 54–61.
view. Rev. Econ. Stud. 58 (3), 495–514.
Ludvigson, S.C., Ng, S., 2009. Macro factors in bond risk premia. Rev. Financ. Stud. 22
Campbell, J.Y., Thompson, S.B., 2007. Predicting excess stock returns out of sample: can
(12), 5027–5067.
anything beat the historical average? Rev. Financ. Stud. 21 (4), 1509–1531.
Mincer, J.A., Zarnowitz, V., 1969. The evaluation of economic forecasts. In: Mincer, J.A.
Chan, K.C., Karolyi, G.A., Longstaff, F.A., Sanders, A.B., 1992. An empirical comparison
(Ed.), Economic Forecasts and Expectations: Analysis of Forecasting Behavior and
of alternative models of the short-term interest rate. J. Finance 47 (3), 1209–1227.
Performance. NBER, Cambridge, pp. 3–46.
Chapman, D.A., Long Jr, J.B., Pearson, N.D., 1999. Using proxies for the short rate: when
Nelson, C.R., Siegel, A.F., 1987. Parsimonious modeling of yield curves. J. Bus. 60,
are three months like an instant? Rev. Financ. Stud. 12 (4), 763–806.
473–489.
Chen, R.-R., Scott, L., 2003. Multi-factor Cox-Ingersoll-Ross models of the term structure:
Newey, W.K., West, K.D., 1994. Automatic lag selection in covariance matrix estimation.
estimates and tests from a Kalman filter model. J. Real Estate Finance Econ. 27 (2),
Rev. Econ. Stud. 61 (4), 631–653.
143–172.
Patton, A.J., Sheppard, K., 2015. Good volatility, bad volatility: signed jumps and the
Christensen, J.H., Diebold, F.X., Rudebusch, G.D., 2011. The affine arbitrage-free class of
persistence of volatility. Rev. Econ. Stat. 97 (3), 683–697.
Nelson–Siegel term structure models. J. Econom. 164 (1), 4–20.
Sarno, L., Schneider, P., Wagner, C., 2016. The economic value of predicting bond risk
Christensen, J.H., Lopez, J.A., Rudebusch, G.D., 2010. Can spanned term structure fac-
premia. J. Empir. Finance 37, 247–267.
tors drive stochastic yield volatility? Working paper. Federal Reserve Bank of San
Svensson, L.E., 1994. Estimating and interpreting forward interest rates: Sweden 1992-
Francisco.
1994. NBER Working Paper w4871.
Christensen, J.H., Rudebusch, G.D., 2015. Estimating shadow-rate term structure models
Vasicek, O., 1977. An equilibrium characterization of the term structure. J. Financ.
with near-zero yields. J. Financ. Econom. 13 (2), 226–259.
Econ. 5 (2), 177–188.
Christensen, K., Oomen, R.C., Podolskij, M., 2014. Fact or friction: jumps at ultra high
Waggoner, D., 1997. Spline methods for extracting interest rate curves from coupon bond
frequency. J. Financ. Econ. 114 (3), 576–599.
prices. Working paper. Federal Reserve Bank Atlanta.
Cieslak, A., Povala, P., 2016. Information in the term structure of yield curve volatility.
Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining
J. Finance 71 (3), 1393–1436.
integrated volatility with noisy high-frequency data. J. Am. Stat. Assoc. 100 (472),
Cochrane, J.H., Piazzesi, M., 2005. Bond risk premia. Am. Econ. Rev. 95 (1), 138–160.
1394–1411.
Collin-Dufresne, P., Goldstein, R.S., 2002. Do bonds span the fixed income markets? The-
ory and evidence for unspanned stochastic volatility. J. Finance 57 (4), 1685–1730.
Collin-Dufresne, P., Goldstein, R.S., Jones, C.S., 2009. Can interest rate volatility be ex-
tracted from the cross section of bond yields? J. Financ. Econ. 94 (1), 47–66.
22

1 s2.0 S0378426623001711 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0378426623001711 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0378426623001711 Main

Uploaded by

Copyright:

Available Formats

Journal of Banking and Finance 155 (2023) 106973

Contents lists available at ScienceDirect

Journal of Banking and Finance

captured by standard aﬃne term structure models is contained in these 𝜏

Model Reference 𝐴𝑚 (𝑑)-notation Market price of risk

Vasicek Vasicek (1977) 𝐴0 (1) 𝜆

models capture this information, we construct volatility forecasts using

Fig. 1. High-frequency yields.

Table 2 proach of Barndorﬀ-Nielsen et al. (2008), and the pre-averaged realized

Fig. 2. Annualized pre-averaged realized yield volatility.

Table 4 out-of-sample hedging.12 In the ﬁxed income case, updating is in the

Fig. 3. Yield volatility forecast errors relative to RW.

Term structure models Term structure models

Time series models Time series models

Table 7 5.3. Economic value of interest rate risk forecasts

Term structure models Maturity-speciﬁc, initial -43.25 -38.33 6.07 -13.21

Risk premium based forecasters

Results appear in Table 9. Recursively updating the selection of the Table 10

volatility forecasting at 𝜏 = 5.15 PCA125 -29.97 -21.76 18.57 -21.63

vealed by including these in parsimonious combinations, in our case Table 11

This table displays 𝑅2𝑂𝑜𝑆 measures in percent rel-

Table A.1 Table A.2

Fig. A.1. Liquidity of Treasury futures over time.

The AFNS0 model of Christensen et al. (2011) is an 𝐴0 (3) model. The

Term structure models

Vasicek 17.09 19.25 -35.37 -131.46

PCA based forecasters

PCA3 -2.83 -12.78 -12.39 -95.14

𝑋𝑡∣𝑡−1 = 𝐶 + 𝐷′ 𝑋𝑡−1∣𝑡−1 , (D.10) 𝔼𝑡 (𝑋𝑡+ℎ ) = 𝐶ℎ + 𝐷ℎ′ 𝑋𝑡 , (D.18)

𝐹𝑡 = 𝑉 𝑎𝑟(𝑣𝑡 ) = 𝐵̃ ′ Σ𝑡∣𝑡−1 𝐵̃ + 𝐻 . (D.15) In Eq. (D.20), the (𝑑 + 𝑑 2 )-vector Θ is

You might also like