1 s2.0 S0378426623001711 Main
1 s2.0 S0378426623001711 Main
1 s2.0 S0378426623001711 Main
The incremental information in the yield curve about future interest rate
risk ✩
Bent Jesper Christensen ∗ , Mads Markvart Kjær, Bezirgen Veliyev
CREATES, Department of Economics and Business Economics, Aarhus University, and the Danish Finance Institute, Fuglesangs Allé 4, DK-8210 Aarhus V, Denmark
A R T I C L E I N F O A B S T R A C T
JEL classification: Using high-frequency intraday futures prices to measure yield volatility at selected maturities, we find that
C58 daily yield curves carry incremental information about future interest rate risk at the long end, relative to that
E43 contained in the time series of historical volatilities. Some of the information in the yield curves is not captured
G12
by standard affine models. Our results point to the existence of an unspanned stochastic volatility factor. Both
time series and yield curve based forecasts provide utility to a risk averse investor, relative to a random walk.
Keywords:
Information from the two sources can be combined to enhance yield volatility forecasting performance.
Term structure models
Volatility
Forecasting
Kalman filtering
Yield curve
1. Introduction only exceptions are Collin-Dufresne et al. (2009) and Joslin and Kon-
chitchki (2018), who consider fixed-window estimation, and do not
Affine term structure models have been the workhorse in the inter- compare the model based forecasts against time series benchmarks be-
est rate literature for decades and can successfully explain bond prices yond a random walk (RW). In contrast, we update the parameters recur-
(see Duffee (2002), Christensen et al. (2011), and many others). How- sively according to investor’s information set, and compare the model
ever, it is questionable whether these models are able to explain yield based volatility forecasts to leading time series benchmarks, in particu-
volatilities, and hence reinvestment rate risk, which plays a crucial role lar, the heterogeneous autoregressive (HAR) model of Corsi (2009) and
in portfolio allocation, market timing, derivative pricing, and risk man- the realized GARCH model of Hansen et al. (2012).
agement. The literature has primarily focused on whether volatilities Our focus is on whether the yield curve contains incremental in-
are spanned by interest rates, as implied by standard affine models. formation about future yield volatility, beyond that available from the
Collin-Dufresne et al. (2009) and Andersen and Benzoni (2010) con- time series of historical volatilities. In addition, we examine whether
clude that they are not, while Jacobs and Karoui (2009) and Bikbov the time series based forecasters contain incremental information about
and Chernov (2009) reach mixed conclusions. future volatility, beyond the information that can be read off the yield
In this paper, we consider out-of-sample forecasting of yield volatil- curve. Further, we supplement the existing static analysis of the span-
ity. Since future reinvestment rate risk matters for pricing and trading ning issue with a dynamic out-of-sample assessment.
decisions, the yield curve should be sensitive to, and hence informative As a separate contribution, we use high-frequency intraday yield
about, future yield volatility. Nevertheless, the forecasting perspective curves in the analysis, thus adding precision to volatility measurements.
has so far received little attention in the yield volatility literature. The The past two decades have seen important advances in high-frequency
✩
We are grateful to Thorsten Beck (the editor), the anonymous referees, Peter Feldhutter, Niels S. Gronborg, Allan Timmermann, Anders B. Trolle, and participants
at the Conference on Computational and Financial Econometrics (CFE) in London, 2019, the Vienna–Copenhagen Conference on Financial Econometrics, 2022, and
the Joint Econometrics-Finance Seminar in Aarhus for useful comments, and to Center for Research in Econometric Analysis of Time Series (CREATES, funded by
the Danish National Research Foundation, DNRF78), the Independent Research Fund Denmark (grant number 2033-00137B), and the Danish Finance Institute (DFI)
for research support.
* Corresponding author.
E-mail addresses: [email protected] (B.J. Christensen), [email protected] (M.M. Kjær), [email protected] (B. Veliyev).
https://doi.org/10.1016/j.jbankfin.2023.106973
Received 26 July 2021; Received in revised form 10 June 2023; Accepted 31 July 2023
Available online 9 August 2023
0378-4266/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
data analysis, primarily focusing on equities, e.g., Barndorff-Nielsen and Jacobs and Karoui (2009) examine the ability of three-factor affine
Shephard (2002), Andersen et al. (2003), and Zhang et al. (2005). How- models to fit conditional volatilities in-sample, using estimates from
ever, intraday data have seen little application in the interest rate litera- an EGARCH model to proxy for true volatility. They reach mixed con-
ture. Notable exceptions are Faust et al. (2007), who use high-frequency clusions, depending on the sample considered. Correlations between
data to study macroeconomic announcement effects, not volatility, and model-implied volatility and the proxy range between 60% and 75%
Andersen and Benzoni (2010) and Cieslak and Povala (2016), who use in US Treasury data, but are much lower in swap data, even negative
realized volatilities of interest rates based on a 10-minute sampling fre- for some maturities. Collin-Dufresne et al. (2009) consider affine mod-
quency, chosen to mitigate the effects of market microstructure noise. els with and without a USV factor and conclude that volatility cannot
We overcome the latter issue in a more direct manner by relying on be extracted from the cross section of interest rates based on swap data.
the pre-averaged realized variance estimator of Jacod et al. (2009), a Andersen and Benzoni (2010) find low 𝑅2 s for regressing future vari-
noise-robust estimator of volatility with excellent empirical properties ances on PCA factors from the yield curve, and argue that they conduct
for various asset classes, as documented by Christensen et al. (2014). a test of whether variances are spanned by bonds. The validity of this
This choice allows us to use data at higher frequency. From US Trea- regression test of USV is questioned by Bikbov and Chernov (2009),
sury and Eurodollar futures prices, we construct yields with maturities who show that low 𝑅2 is expected if yields are observed with error,
6 months, and 1, 5, and 7 years, at the 1-minute frequency. From these, a standard assumption when estimating term structure models. Bikbov
we calculate a daily time series of realized volatility measures at each and Chernov (2009), Joslin (2017), and Joslin and Konchitchki (2018)
of the four maturities. We consider daily yield curve based forecast- consider both swaps and swaptions and reject the restrictions needed
ers of volatility over the subsequent month, constructed from either (i) for affine models to generate USV. We contribute to this literature by
affine term structure models, (ii) principal component analysis (PCA), considering the spanning question in an out-of-sample framework, and
or (iii) interest rate (yield and forward) spreads, estimated recursively providing evidence pointing to the existence of a USV factor.
in a standard daily yield panel. The alternative forecasters based on the The USV puzzle has recently led to the construction of more complex
time series of historical volatilities (RW, HAR, realized GARCH) serve models, with a focus on the ability to fit volatilities and price swaptions.
as benchmarks. Feldhütter et al. (2016) propose a nonlinear model that is able to gen-
We find that the HAR model provides a strong benchmark for yield erate features consistent with USV. Cieslak and Povala (2016) construct
volatility forecasting. It provides relatively accurate forecasts across all a model with volatility factors driven by a Wishart process. Filipović
maturities considered. At the intermediate 5 year maturity, a simple et al. (2017) present a nonlinear model featuring USV that can price
forward spread, motivated by risk premium considerations, or a com- both bonds and swaptions successfully. None of these papers considers
mon factor approach provide equally accurate forecasts, and all three out-of-sample forecasting. It is beyond the scope of the present paper to
improve significantly over the naive random walk volatility forecast. include these more complex models, as they are very time-consuming
However, we also find that the yield curve contains incremental in- to implement and reestimate recursively.
formation about future yield volatility relative to the time series of The paper is organized as follows. Section 2 describes the forecast-
historical volatilities, including HAR, in terms of forecasting accuracy ing methods, including the term structure models, PCA and interest
at the long end of the curve. Next, we carry out a specification test as in rate spread based forecasts, and time series models. Section 3 describes
Cieslak and Povala (2016) and find that none of the affine or time series the high-frequency futures data and the construction of realized yield
models considered subsumes all the relevant information about future
volatility measures. Section 4 presents the estimation method for the
volatility contained in the yield curve across all maturities. This indi-
term structure models based on the Kalman filter. Section 5 contains
cates that the yield curve contains some important information about
the empirical analysis, and Section 6 the robustness analysis. Section 7
future reinvestment rate risk that neither the term structure models nor
concludes. The Appendix contains additional material on data, models,
the volatility history capture. Extending this idea, we develop a test of
estimation, and results.
whether past volatility forecast errors are informative about future fore-
cast errors, as would be the case under unspanned stochastic volatility
(USV) in the sense of Collin-Dufresne and Goldstein (2002). We per- 2. Interest rate risk forecasters
form the test by extracting a common factor from the volatility forecast
errors across maturities and find that it provides significant informa- Throughout, 𝑦𝜏𝑡 denotes the maturity 𝜏 yield at time 𝑡. In our em-
tion about future forecast errors, thus indicating the presence of USV pirical out-of-sample analysis, we consider daily forecasting of month-
features in the data. ahead interest rate risk by maturity. To assess the forecasts, we use the
We investigate the economic value of the forecasters to an investor high-frequency data to calculate daily realized yield volatility, labeled
in a portfolio allocation exercise, using a utility-based framework as 𝑉𝑡𝜏 (see Section 3.4 – we consider the pre-averaged realized variance
in Bollerslev et al. (2018). From the results, the information about fu- and henceforth use the terms realized volatility and realized variance
ture interest rate risk contained in either the yield curve or historical interchangeably for this). In line with the volatility literature (e.g., Pat-
volatility provides economic value to a risk averse investor. A robust- ton and Sheppard (2015) and Bollerslev et al. (2016)), the target for
ness analysis shows that higher-order PCA factors beyond the first three the forecast built at 𝑡, based on investor’s information set, is not the re-
usually explaining yields in fact carry information about volatility. The alized volatility at 𝑡 + ℎ, but instead the aggregated realized measure
∑
portion of future volatility information in the yield curve that is not 𝜏
𝑉𝑡+1→𝑡+ℎ = ℎ𝑖=1 𝑉𝑡+𝑖
𝜏 , a proxy for integrated volatility from 𝑡 through
2
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
Table 1
Affine models.
This table presents the reference, notation of Dai and Singleton (2000), and market price of
risk specification for the affine term structure models considered.
2.1. Yield curve based volatility forecasters a scalar. In addition, 𝑆(𝑋𝑡 ) is a diagonal 𝑑 × 𝑑 matrix with each element
affine in 𝑋𝑡 ,
We consider an 𝑁 × 𝑇 fixed-maturity panel, i.e., the yields at 𝑡 are
𝜏 𝜏
𝑦𝑡 = (𝑦𝑡 1 , … , 𝑦𝑡 𝑁 )′ , 𝑡 = 1, … , 𝑇 , and yields of all maturities can be used to [𝑆(𝑋𝑡 )]𝑖𝑖 = 𝛼𝑖 + 𝛽𝑖′ 𝑋𝑡 , (3)
generate volatility forecasts at a given maturity 𝜏. For each candidate
forecasting method, recursive volatility forecasts are constructed using with 𝛼𝑖 a scalar, and 𝛽𝑖 a 𝑑 × 1 vector. Given a suitable market price of
the regression risk specification 𝜆𝑡 (see Appendix C), the dynamics of 𝑋𝑡 are governed
by an affine diffusion under the risk-neutral measure ℚ, too,
𝜏
𝑉𝑡+1→𝑡+ℎ = 𝜌𝜏,ℎ + 𝜌𝜏,ℎ′ 𝑍𝑡 + 𝑢𝜏,ℎ , (1)
0 1 𝑡+ℎ √
̃ 𝜃̃ − 𝑋𝑡 )𝑑𝑡 + Σ 𝑆(𝑋𝑡 )𝑑𝑊𝑡ℚ .
𝑑𝑋𝑡 = 𝜅( (4)
for fixed 𝜏, ℎ, with 𝜌𝜏,ℎ
1
a 𝑞 × 1 vector of predictive coefficients, 𝜌𝜏,ℎ
0
the scalar intercept, and 𝑍𝑡 the relevant information variable extracted Following Duffie and Kan (1996), the yields are given by
from the yield panel and conditioning the forecast as of 𝑡 for the given
method, e.g., a 𝑞-vector of fitted PCA factors, or a conditional volatility 𝐴(𝜏) 𝐵(𝜏)′
𝑦𝜏𝑡 = + 𝑋𝑡 , (5)
forecast from an affine model. When constructing the forecast at 𝑡′ of 𝜏 𝜏
𝑉𝑡𝜏′ +1→𝑡′ +ℎ , only data through 𝑡′ are used to extract 𝑍1 , … , 𝑍𝑡′ from the where 𝐴, 𝐵 are solutions to the system of ordinary differential equations
yield panel, then estimate Eq. (1) over 𝑡 = 1, … , 𝑡′ − ℎ, and the forecast
𝑑
is 𝜌̂𝜏,ℎ + 𝜌̂𝜏,ℎ′ 𝑍𝑡′ . We consider methods for extracting 𝑍𝑡 based on affine 𝑑𝐴(𝜏) ̃ ′ ′ 1 ∑[ ′ ]2
0 1 = 𝜃 𝜅̃ 𝐵(𝜏) − Σ 𝐵(𝜏) 𝑖 𝛼𝑖 + 𝛿0 , 𝐴(0) = 0 ,
term structure models, PCA, common factors, and risk premiums. 𝑑𝜏 2 𝑖=1
The recursive regressions in Eq. (1) are used in the construction of (6)
𝑑
volatility forecasts, for two reasons. First, investor is not basing the 𝑑𝐵(𝜏) 1 ∑[ ′ ]2
= −𝜅̃ ′ 𝐵(𝜏) − Σ 𝐵(𝜏) 𝑖 𝛽𝑖 + 𝛿1 , 𝐵(0) = 0 ,
forecast of 𝑉𝑡𝜏′ +1→𝑡′ +ℎ exclusively on 𝑍𝑡′ from the daily yields, hav- 𝑑𝜏 2 𝑖=1
ing observed the history of realized volatilities through 𝑡′ . Second, in
which can be solved either analytically or numerically.
the implementation, volatility forecasts based on 𝑍𝑡 include estima-
By Eq. (5), the ℎ periods ahead conditional variance of the maturity
tion error, whereas realized yield variance used as forecasting target
𝜏 yield, given information through 𝑡, is
in the assessment includes measurement error, thus leading to poten-
tial biases. Therefore, the Mincer and Zarnowitz (1969) type predictive 1
regressions in Eq. (1) are used to leverage any indication in investor’s
𝜏
𝑉𝑡,ℎ = 𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) = 𝐵(𝜏)′ 𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ )𝐵(𝜏) = 𝑏𝜏,ℎ + 𝑏𝜏,ℎ′ 𝑋𝑡 , (7)
𝜏2 0 1
information set that the centering or scale of 𝑍𝑡 from the given method
i.e., affine in the latent state variables. The precise forms of 𝑏𝜏,ℎ and 𝑏𝜏,ℎ
do not line up well with subsequent realized volatilities. The main ques- 0 1
are given in Appendix D.3. We consider affine models with 𝑑, the num-
tion is whether 𝑍𝑡 serves as a useful information variable, i.e., whether
variation in this provides incremental predictive power. If, for exam- ber of factors, from 1 to 3. Table 1 summarizes the models considered,
ple, a combination of the most recent realized volatility observations including reference, 𝐴𝑚 (𝑑) classification as in Dai and Singleton (2000),
𝑉𝑡𝜏′ , 𝑉𝑡𝜏′ −1 , … provides the best forecast at 𝑡′ , as in the HAR model, then 𝑚 being the number of factors conditioning variances, and market price
this feature will not be fully captured by the intercept and slope in of risk specification. In Appendix C, we provide a formal description of
Eq. (1), estimated over the full window 𝑡 = 1, 2, … , 𝑡′ − ℎ. It should be each model. We use Eq. (7) as the basis of our first yield curve based
possible to improve on the resulting forecast using pure time series forecasters. Given the diversity of models considered, we are able to
methods and historical volatilities, hence revealing that 𝑍𝑡 from the examine the impact of both the number of latent factors and stochas-
panel does not carry incremental information. Thus, the regression ap- tic (state dependent) volatility on the forecasting of future interest rate
proach secures a level playing field for comparison of yield curve and risk.
time series based yield volatility forecasts. Implementation of Eq. (7) requires values for 𝑏𝜏,ℎ 0
, 𝑏𝜏,ℎ
1
, and 𝑋𝑡 . We
In our empirical work, we use 𝑁 = 8 maturities to extract 𝑍𝑡 from estimate the models recursively by quasi maximum likelihood (QML)
daily data, and volatility forecasts based on investor’s information set using the Kalman filter and an expanding estimation window, allow-
are constructed using the recursive regressions in Eq. (1) for each of the ing for additive measurement errors in yields (see Section 4 for details).
four maturities 𝜏 for which we have high-frequency data. To construct the forecast at 𝑡′ of subsequent realized yield volatility
𝑉𝑡𝜏′ +1→𝑡′ +ℎ , only information through 𝑠 is used to generate 𝑏𝜏,ℎ 0
, 𝑏𝜏,ℎ
1
, and
2.1.1. Affine term structure models 𝑋1 , … , 𝑋𝑠 , and hence the conditional volatility (or variance) estimate
In the affine class, the short rate 𝑟𝑡 = 𝑦0𝑡 is driven by some 𝑑- 𝑉̂𝑠,ℎ
𝜏 from Eq. (7), for the given 𝜏 and ℎ. This procedure is repeated for
dimensional vector of state variables, 𝑋𝑡 , such that 𝑟𝑡 = 𝛿0 + 𝛿1′ 𝑋𝑡 , and 𝑠 = 1, … , 𝑡′ . Next, Eq. (1) is estimated with 𝑍𝑡 = 𝑉̂ 𝜏 , 𝑡 = 1, … , 𝑡′ − ℎ, i.e.,
𝑡,ℎ
the dynamics of 𝑋𝑡 are 𝑞 = 1 in this case, and the final forecast of 𝑉𝑡𝜏′ +1→𝑡′ +ℎ is 𝜌̂𝜏,ℎ + 𝜌̂𝜏,ℎ′ 𝑉̂𝑡𝜏′ ,ℎ .
0 1
√ Thus, even with correct model specification, the conditional variance
𝑑𝑋𝑡 = 𝜅(𝜃 − 𝑋𝑡 )𝑑𝑡 + Σ 𝑆(𝑋𝑡 )𝑑𝑊𝑡 , (2)
itself is not an unbiased forecast of subsequent realized yield volatility.
where 𝑊𝑡 is a 𝑑-dimensional Brownian motion under the physical mea- By the law of total variance, it includes the variance of the conditional
sure ℙ, 𝜅 and Σ are 𝑑 × 𝑑 matrices, 𝛿1 and 𝜃 are 𝑑 × 1 vectors, and 𝛿0 is mean yield, given both information through 𝑡′ and integrated volatility
3
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
through 𝑡′ + ℎ, thus inducing an upward bias.3 The recursive regressions 𝜏 yield itself and the short rate, which we proxy by the CRSP 1-month
1∕12
in Eq. (1) provide a simple means of converting conditional variances T-bill rate, denoted 𝑦𝑡 . Thus, volatility forecasts are constructed from
Eq. (1), with 𝑍𝑡 = 𝑦𝜏𝑡 − 𝑦𝑡 , i.e., 𝑞 = 1.
into proper forecasts of future interest rate risk based on investor’s in- 1∕12
formation set. The final risk premium based forecaster is based on the forward
spread of Fama and Bliss (1987). We use the difference between the
( )
2.1.2. PCA based forecasters maturity 𝜏 forward rate, 𝑓𝑡𝜏 = 𝑦𝜏𝑡 + (𝜏 − ℎ) 𝑦𝜏𝑡 − 𝑦𝜏−ℎ
𝑡 ∕ℎ, and the short
To examine whether or not the yield curve contains relevant infor- rate as explanatory variable. Thus, maturity 𝜏 yield volatility forecasts
are constructed from Eq. (1), with 𝑍𝑡 = 𝑓𝑡𝜏 − 𝑦𝑡 .
mation about future volatility and, if it does, whether the term structure 1∕12
(2006)), but potentially important for our month-ahead forecasts. thors on request.
4
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
3. Data description and construction of realized measure Finally, Eurodollar-implied government yields are converted into
zero-coupon bond prices, so that units match with the coupon bonds.
3.1. High-frequency data approach For each of the five contracts (three Treasury and two Eurodollar
futures), we construct minute bars containing the first observation in
Our high-frequency data set is based on the three Treasury futures each minute. If multiple observations on the same contract have the
(5 years, 10 years, and long term) for the long end of the yield curve, same time stamp, we use the median price. We apply a sanity check,
and 3-month Eurodollar futures for the short end. The Treasury futures discarding observations at a distance more than ten times the absolute
are traded on the Chicago Board of Trade (CBOT), and the Eurodollar mean from the median of the previous ten observations. Finally, we ex-
futures on the Chicago Mercantile Exchange (CME). Data are obtained clude days without any observation for a whole hour for at least one of
from CME Group. For liquidity reasons, we start our sample on Jan- the five futures contracts. This eliminates 96 trading days, leaving 5,019
uary 2, 2000, and data run through December 31, 2020. Due to the observation days in the final analysis data set. Appendix B describes the
opening hours in the early part of our sample, we consider a trading liquidity of the Treasury contracts.
day from 7:20 am to 2:00 pm Eastern Time. All weekends and non-
working days are excluded. This leaves a raw data set covering 5,115
3.3. Construction of intraday yield curves from bond prices
days.
The Treasury futures have delivery dates in four different months
In the second step, extracting high-frequency yield curves from the
during the year—March, June, September, and December. For each un-
futures-implied bond prices using cubic splines, an implication of our
derlying bond maturity, we include the futures contract with highest
approach is that the maturities of bonds considered vary over the sam-
liquidity, in most cases that with shortest term to delivery. For the Eu-
rodollar futures, we include the two contracts with closest to 3 and ple period, and therefore knot points vary, too. Due to the small number
9 months to delivery, and convert the prices of these into yields of 6 of cross-sectional observations (five futures contracts), we choose not to
months and 1 year to maturity. Section 3.2 describes the details of the interpolate the calibrated curves between points of principal payments.
procedure. The outcome of the procedure is a complete set of consecutive yield
We extract high-frequency yield curves from the five futures prices curves at the 1-minute frequency. More details on the second step are
in a two-step approach. In the first step, futures prices are converted provided in Appendix A.2.
into coupon bond prices. In the second, yield curves are extracted from Yields are read off the calibrated curves at maturities 𝜏 reflecting
the coupon bond prices. Our approach differs from that of Faust et the underlying maturities of the futures contracts, to ensure a sufficient
al. (2007) mainly in two respects. First, we include more information amount of market based information around the relevant points along
on the long end of the curve, using all three Treasury futures, rather the curves. Initially, we consider 𝜏 = 0.5, 1, 5, 7, and 15 years. The
than only 10-year contracts. Second, following, e.g., Cieslak and Po- match to the underlying maturities of the corresponding futures is by
vala (2016), we calibrate yield curves using the spline method of Fisher construction for the first two, and holds to a reasonable degree for the
et al. (1995), instead of the Nelson and Siegel (1987) method.5 In the third, while 𝜏 = 7 is included as the closest among standard maturities
following, we briefly outline the approach. to the approximately 7.5 year average maturity of the Treasury note un-
derlying the 10-year contract. The underlying of the long term Treasury
3.2. Construction of intraday bond prices from futures data futures is a hypothetical 6% bond. At expiration, a bond with maturity
between 15 and 25 years must be delivered. When the interest rate is
Both Treasury and Eurodollar futures are considered in the first step. below 6%, conversion rates favor bonds with short maturities, i.e., the
The underlying bond of a Treasury futures contract is hypothetical, and underlying bond is actually of maturity 15 years for our out-of-sample
unknown until delivery. The seller chooses a bond from a delivery bas- window, and hence 𝜏 = 15 is considered as a candidate fifth maturity for
ket consisting of bonds meeting requirements from CME, implying that our construction. To verify the quality of the resulting high-frequency
a Treasury futures involves two options: When to deliver, and which yields at the five candidate 𝜏-values, we match a daily frequency sub-
bond to deliver. We neglect the value of the first option by assuming sample of them with daily Gürkaynak et al. (2007) yields. For the latter,
that the delivery date is the first working day of the delivery month. the number of bonds with maturity around 15 years outstanding at any
Next, given the delivery date, we construct the delivery basket by com- given point in time in the CRSP data is rather limited, hence imply-
bining the futures data with daily CRSP data on Treasuries, find the ing a low weight around this maturity in the calibration, and we find a
cheapest-to-deliver (CTD) bond in the basket thus constructed, and cal- relatively low correlation between the resulting yields and the daily sub-
culate the futures-implied coupon bond price.6 More details on this step sample of ours at 𝜏 = 15. Thus, as we cannot verify the quality of our fit
are provided in Appendix A.1. at the fifth maturity, we henceforth restrict attention to the yields read
The value of the Eurodollar futures at maturity is determined by off our high-frequency curves at 𝜏 = 0.5, 1, 5, and 7 years, although
the 3-month LIBOR rate. From the observed futures prices, we derive we continue to use all five futures contracts in the construction of the
high-frequency LIBOR rates with maturity date 3 months after the de- curves.
livery date. LIBOR rates are afflicted with credit risk, and we deduce At the four maturities retained, correlations between the end-of-day
Eurodollar-implied government yields by assuming a constant credit subset of futures-implied yields extracted from our 1-minute data and
spread through the day between LIBOR and government yields. Daily the daily calibrated yields range between 99.37% and 99.99% (see Ta-
LIBOR rates with maturities of 6 months and 1 year are obtained from ble A.1 in the Appendix). In effect, we have created a high-frequency
the St. Louis Fed and compared with daily Gürkaynak et al. (2007) intraday version of the Gürkaynak et al. (2007) dataset up to maturity
yields (see Section 3.5 for the construction) to estimate the spread. 𝜏 = 7. Table 2 shows descriptive statistics on our yield data, by matu-
rity. On average over the sample period, the term structure of interest
rates is mainly upward sloping, and the term structure of volatilities
5 A related cubic spline method due to Waggoner (1997) is adopted by Dai et
downward sloping, although both are relatively flat at the short end.
al. (2007) and Andersen and Benzoni (2010).
6 Treasuries are non-convertible, and starting in 2000 avoids problems with Skewness and kurtosis are modest, especially at longer maturities. Fig. 1
displays the evolution through time in yields, by maturity, along with
the flower bonds issued until 1965, as these all matured by 1998. Callable bonds
and notes were issued until 1985, but many of these subsequently repurchased a three-dimensional view of the evolution of the high-frequency yield
by the Treasury and reissued as non-callable, although on a discretionary basis, curves through calendar time. The transition to the zero lower bound
without sinking fund provision. We drop the small number of remaining callable (ZLB) regime around 2008 has a strong impact, especially at the shorter
issues outstanding as of January 2, 2000. maturities.
5
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
6
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
Table 3 (volatility spikes are included in 22 measures rather than one). As ex-
Descriptive statistics on pre-averaged realized yield pected, other moments are smallest for the aggregated measures.
volatility. Fig. 2 displays the evolution through time in daily annualized pre-
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
averaged realized one-month yield volatility in percent, by maturity,
√
corresponding to Table 3, Panel B, along with a three-dimensional view
Panel A: One-day volatility, 𝑉𝑡𝜏 of the volatility surface.
Mean 0.89 0.79 1.28 1.22
Volatilities rise quite dramatically for the shortest maturities during
the financial crisis and the transition to the ZLB regime, and drop to a
Std. 1.05 1.01 1.86 1.51
low level after the transition, only to rise again around 2016.
Skewness 4.33 4.79 2.13 1.78
Kurtosis 45.48 53.80 7.19 6.63
3.5. Daily yield data
√
𝜏
Panel B: Month-ahead volatility, 𝑉𝑡+1∣𝑡+22
To enable estimation of the affine term structure models, as well
Mean 1.16 1.01 2.14 1.87
as the PCA, common factor, and risk premium based forecasters, on a
Std. 0.81 0.84 0.92 0.69 daily yield panel with more than four observations in the cross section,8
Skewness 2.41 2.04 -0.09 0.60 we consider the estimated parameters provided by Gürkaynak et al.
Kurtosis 11.57 9.08 2.54 3.68 (2007). Their daily frequency dataset is extracted from a large set of
Panel C: One-day variance, 𝑉𝑡𝜏
coupon bonds, using the Svensson (1994) method. The continuously
compounded yield at maturity 𝜏 is written as
Mean 1.88 1.64 5.12 3.75
− 𝜃𝜏 ( − 𝜃𝜏 ) ( − 𝜃𝜏 )
Std. 8.62 8.48 12.98 8.50 1−𝑒 1 1−𝑒 1 − 𝜃𝜏 1−𝑒 2 − 𝜃𝜏
𝑦𝜏𝑡 = 𝛽0 +𝛽1 𝜏 +𝛽2 𝜏 −𝑒 1 +𝛽3 𝜏 −𝑒 2 , (17)
Skewness 31.07 33.72 3.84 6.17
𝜃1 𝜃1 𝜃2
Kurtosis 1462.90 1663.60 21.09 101.05
and estimates of (𝛽0 , 𝛽1 , 𝛽2 , 𝛽3 , 𝜃1 , 𝜃2 ) are provided at daily frequency.9
𝜏
Panel D: Month-ahead variance, 𝑉𝑡+1∣𝑡+22 We follow Christensen et al. (2010) and consider 𝑁 = 8 maturities in
Mean 1.99 1.73 5.42 3.96 the daily cross sections, namely, 3 and 6 months, and 1, 2, 3, 5, 7, and
Std. 3.74 3.43 4.00 2.97
10 years.
Skewness 5.49 5.24 0.89 1.83
4. Estimation
Kurtosis 41.24 39.13 3.58 7.98
This table presents the mean, standard deviation, The affine term structure models from Section 2.1.1 are estimated
skewness, and kurtosis of the daily annualized pre- on daily data, cf. Section 3.5, using the Kalman filter. The basic mea-
averaged realized yield volatilities, by maturity. Statis- surement and transition equations are obtained by allowing for mea-
tics for the square root (volatility) form are shown surement error 𝜀𝜏𝑡+ℎ in the yields in Eq. (5), and discretizing the state
for one-day measures in Panel A, and for one-month
dynamics in Eq. (2), i.e., the state space model is given by
(ℎ = 22) measures in Panel B, both in percent. Statis-
tics for the raw (variance) form are shown for one-day 𝐴(𝜏) 𝐵(𝜏)′
𝑦𝜏𝑡+ℎ = + 𝑋𝑡+ℎ + 𝜀𝜏𝑡+ℎ , (18)
measures in Panel C, and for one-month (ℎ = 22) mea- 𝜏 𝜏
sures in Panel D, both in basis points. The sample spans 𝑋𝑡+ℎ = 𝐶ℎ + 𝐷ℎ′ 𝑋𝑡 + 𝜂𝑡+ℎ , (19)
the period from January 2, 2000, through December
31, 2020. where 𝜀𝜏𝑡+ℎ
∼ N(0, 𝐻𝜏,ℎ ), 𝜂𝑡+ℎ ∼ N(0, 𝑄𝑡,ℎ ), ℎ = 1 for daily data and daily
time index 𝑡, with expressions for 𝐴 and 𝐵 in Appendix C, and for 𝐶ℎ ,
𝐷ℎ , and 𝑄𝑡,ℎ in Appendix D. For the two Gaussian models in Table 1,
the standard linear filter applies. For the stochastic volatility models,
and Benzoni (2010) and Bollerslev et al. (2018). The risk measure over
𝜏 Cox et al. (1985) (henceforth CIR) and AFNS3 , we apply the extended
the next month is simply 𝑉𝑡+1→𝑡+ℎ , ℎ = 22, aggregating the pre-averaged
Kalman filter, approximating transitions by Gaussian distributions.10
realized variances over 22 trading days.
Upon estimation, conditional variance estimates are computed us-
Table 3 shows descriptive statistics on the daily annualized pre-
√ ing Eq. (7), now with ℎ indicating the forecasting horizon (ℎ = 22
averaged realized yield volatilities over one day, 𝑉𝑡𝜏 , in Panel A, and
√ for month-ahead forecasting), and corrected for the bias stemming
𝜏
one month, 𝑉𝑡+1→𝑡+22 , in Panel B, in percent, by maturity. Although from measurement error in yields, 𝜀𝜏𝑡+ℎ in Eq. (18), producing 𝑉̃𝑡,ℎ 𝜏 =
the volatility (square root) form of the measures facilitates interpre- 𝑏𝜏,ℎ
0
+ 𝑏𝜏,ℎ′
1
𝑋𝑡 + 𝐻𝜏,ℎ . Recursive volatility forecasts are constructed using
tation, e.g., a unit mean corresponds to a one percent annual yield 𝑉̃ for 𝑍𝑡 in Eq. (1).
𝜏
𝑡,ℎ
volatility, the raw (variance) form is used for forecasting, and descrip-
tive statistics for this are reported in Panels C and D. 5. Empirical results
The term structure of volatilities (variances) exhibits a hump shape
across the four maturities considered, with highest average at the 𝜏 = 5 We consider the forecasting of yield volatility over the next month.
year maturity. The standard deviation (time series variation in volatil- The first set of estimates is based on the period January 2, 2000,
ity) shows a similar pattern. Skewness and excess kurtosis are higher through December 31, 2007, and the next 100 observations are used
for shorter maturities, 𝜏 = 0.5 and 1, relative to longer, 𝜏 = 5 and 7,
where they essentially vanish for the one-month volatility measure.7
8 Our high-frequency yield curves are constructed using five futures con-
On average, volatilities over one month are slightly larger than over
one day, possibly reflecting that some conditional mean variation re- tacts, and four yields are retained in the resulting high-frequency panel, cf.
Section 3.3. The time series models are estimated using realized measures (Sec-
mains in the realized measures due to finite sampling frequency, cf.
tion 3.4) based on the high-frequency panel.
footnote 3, as well as Jensen’s inequality for the square root measures 9
Available at http://www.federalreserve.gov/pubs/feds/2006/200628/
200628abs.html and updated daily.
10 Appendix D provides further details on estimation, including the Kalman
7
Skewness and kurtosis are close to 0 and 3. filter and the (quasi) log likelihood function.
7
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
for recursive estimation of the predictive regressions in Eq. (1). This ing a one-sided Diebold and Mariano (1995) test based on the Newey
leaves 2,976 observations for the out-of-sample period, covering June and West (1994) variance estimator with automatic lag selection of An-
7, 2008, through December 31, 2020. This way, the out-of-sample win- drews (1991).
dow starts just before the transition to the ZLB regime. For PCA, risk Table 4 presents the resulting 𝑅2𝑂𝑜𝑆 statistics in percent and Diebold-
premium based forecasters, and the HAR model, forecasts are not re- Mariano 𝑝-values for all forecasters and maturities considered. Only the
stricted to be positive. To ensure meaningful volatility forecasts, we HAR model generates positive 𝑅2𝑂𝑜𝑆 statistics across all maturities. Fur-
apply a sanity filter, such that we do not forecast below the 2.5 or above thermore, for each maturity, HAR generates the highest statistic (most
the 97.5 percentiles of the empirical distribution of observed realized accurate forecasts) across all models, except that the forward spread
variances.11 and the best common factor based forecaster get even higher statistics
at 𝜏 = 5. From the results, yield volatility forecasting is hardest at the
5.1. Statistical value of interest rate risk forecasts short and long ends of the curve, with only HAR generating positive
𝑅2𝑂𝑜𝑆 at 𝜏 = 0.5, 1, and 7. In contrast, at the intermediate maturity 𝜏 = 5,
To assess the yield volatility forecasts against a RW, we consider the 𝑅2𝑂𝑜𝑆 is positive for all models, except Vasicek.
𝑅2𝑂𝑜𝑆 measure of Campbell and Thompson (2007), A few comparisons within the first panel of Table 4, the results for
∑𝑇 −ℎ the term structure models, are worth noting. Among the four models
𝜏
𝑡=𝑡 +1 (𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝜉
𝜏 )2
considered, Vasicek shows the strongest forecasting performance at the
𝑅2𝑂𝑜𝑆 = 1 − ∑𝑇 −ℎ0 , (20)
𝜏
𝑡=𝑡0 +1 (𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝑅𝑊
𝜏 )2 two shortest maturities, and the weakest at the two longest. Introducing
stochastic volatility only leads to improved forecasts in half of the cases
where 𝑉̂𝑡,ℎ,𝜉
𝜏 is the forecast from model 𝜉, 𝑡0 is the end of the initial esti- considered, i.e., at the two longest maturities for the one-factor model,
mation period, and 𝑇 is the end of the sample. A positive 𝑅2𝑂𝑜𝑆 indicates and at the two shortest for the three-factor model.
more accurate volatility forecasts from model 𝜉 than from the RW. The Although constant-volatility models are typically rejected in-sample,
null hypothesis 𝑅2𝑂𝑜𝑆 ≤ 0 is tested against the alternative 𝑅2𝑂𝑜𝑆 > 0 us- parsimony can be rewarded in out-of-sample forecasting comparisons,
as it reduces parameter uncertainty and the risk of overfitting, albeit at
the expense of increased risk of model misspecification. In the present
11 The empirical distribution of observed variances is updated recursively, and case, the results suggest that parsimony is rewarded at shorter maturi-
forecasts below the 2.5 and above the 97.5 percentiles replaced by these. ties, with the single-factor homoskedastic model performing best, and
8
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
9
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
𝑅2𝑂𝑜𝑆 statistic (most accurate forecasts) across all models considered, vestor is not ignoring recent volatilities, cf. Eq. (1). Historical volatility
and at 𝜏 = 5, the forward spread based forecaster and AFNS0 get close. (HAR) forecasts better than the curve-based methods, except that Va-
Introducing stochastic volatility leads to improved forecasts at 𝜏 = 5 for sicek initially does better at the short end. Improvements relative to
the one-factor model, and at the two shortest maturities for the three- the RW accumulate over calendar time at 𝜏 = 5, except for the Vasicek
factor model. Although AFNS3 now performs better than CIR at three model. At other maturities, results stabilize over the ZLB period, which
maturities, Vasicek continues to dominate the other term structure mod- runs through 2016 (cf. Fig. 1), well after the end of quantitative easing
els at the two shortest. Including more than three components in the by 2014, then deteriorate for most models, HAR being the exception.
maturity-specific PCA based forecaster makes forecasts deteriorate, and Overall, the results show that information from either the yield
initial selection of factors generally generates more accurate forecasts curve or the time series of volatilities can be used to improve volatility
than recursive updating for the common factor approach. The cross sec- forecasts over the RW. Furthermore, the results indicate that the yield
tion based (PCA, risk premium, common factor) forecasters generally curve contains information about future volatility that is not captured
perform better than the term structure models, hence pointing to pre- by the term structure models.
dictive information in the curve not captured by the affine models. One
difference in results is that significance relative to RW is now attained 5.2. Extracting incremental information
at the shortest maturity for HAR, at level 10%. Henceforth, we focus on
the one-month forecasting horizon (ℎ = 22). Here, we examine the possibility that the yield volatility forecasting
Fig. 3 shows yield volatility forecast errors over time, by matu- equations based on the affine term structure models suffer from omitted
rity, for selected forecasting methods. Cumulative squared errors for variable bias, i.e., that the expression for future volatility in Eq. (7)
a given forecasting method are subtracted from those for the RW, so should be expanded to
that an increasing curve indicates better forecasting than by RW, and 𝜏
𝑉𝑡,ℎ = 𝑏𝜏,ℎ + 𝑏𝜏,ℎ′ 𝑋𝑡 + 𝑏𝜏,ℎ′ 𝑋̃ 𝑡 , (21)
vice versa. In the common factor case, the improvement in the approach 0 1 2
with initial selection relative to recursive updating occurs around 2009, for forecasting purposes. We consider three ways of extracting the in-
consistent with the ZLB transition driving the phenomenon. For com- cremental information variable 𝑋̃ 𝑡 :
parison, the figure shows results for Vasicek, CIR, and HAR, too. Most (i) By PCA factors from the yield curve.
methods exhibit jumps in forecasting accuracy, moving into the ZLB (ii) Combining (i) with a factor extracted from the past volatility fore-
regime, although the effect materializes more gradually at longer ma- cast errors.
turities. It could be expected that CIR (with state-dependent volatility) (iii) By the realized volatility measure based on high-frequency data.
would handle the entire ZLB period better than Vasicek, but this is not Since PCA factors capture the shape of the yield curve nonparametri-
confirmed. At the shorter maturities, Vasicek quickly adjusts to the low- cally, improved forecasting performance (here, significance of 𝑏𝜏,ℎ 2
) in
volatility regime, whereas CIR performs worse than the RW for about case (i) indicates that the yield curve contains incremental information
two years. The results show that simpler is better around the transi- about future interest rate risk relative to that in the affine term struc-
tion to ZLB (initial selection as opposed to recursive, constant-volatility ture model considered. In (ii), significance indicates the existence of
model as opposed to CIR), presumably in part because the artificial in- a factor explaining yield volatilities, but not yields, i.e., a USV case. In
10
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
Table 5 Table 6
The incremental information on future volatility in the yield curve. Specification test for unspanned stochastic volatility factor.
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7 𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
Vasicek 7.31 1.14 6.39 18.20 Vasicek -3.01 -1.65 -1.90 -2.55
(0.06) (0.77) (0.09) (0.00) (0.00) (0.10) (0.06) (0.01)
CIR 2.92 4.93 11.85 18.34 CIR -9.08 -4.49 -2.43 -3.36
(0.40) (0.18) (0.01) (0.00) (0.00) (0.00) (0.02) (0.00)
AFNS0 31.18 17.14 0.75 8.49 AFNS0 -5.85 -2.91 -1.12 -2.31
(0.00) (0.00) (0.86) (0.04) (0.00) (0.00) (0.26) (0.02)
AFNS3 4.16 6.97 24.56 82.27 AFNS3 -8.18 -4.23 -2.09 -1.57
(0.24) (0.07) (0.00) (0.00) (0.00) (0.00) (0.04) (0.12)
HAR 5.08 5.41 4.48 18.57 HAR -1.35 -0.79 2.48 2.49
(0.17) (0.14) (0.21) (0.00) (0.18) (0.43) (0.01) (0.01)
Mean-reverting realized GARCH 7.81 8.27 6.60 23.73 Mean-reverting realized GARCH -5.54 -2.52 -1.43 -2.30
(0.05) (0.04) (0.09) (0.00) (0.00) (0.01) (0.15) (0.02)
This table shows results from regression Eq. (22), explaining yield This table shows results from regression Eq. (23), explaining yield
volatility forecast errors from the specified models using three PCA fac- volatility forecast errors from the specified models augmented with
tors fitted to past yield curves, based on Eq. (8). Reported values are three yield curve PCA factors as in Eq. (22) using a PCA factor fitted
𝐹 -statistics for joint tests of 𝜙𝜏,ℎ
1
= 0 (three coefficients). Asymptotic to past yield volatility forecast errors. Reported values are 𝑡-statistics
𝑝-values in parentheses. The initial estimation period ranges from Jan- for testing 𝜓1𝜏,ℎ = 0. Asymptotic 𝑝-values in parentheses, based on the
uary 2, 2000, through June 6, 2008, and the out-of-sample period from Newey-West variance estimator with automatic lag selection of An-
June 7, 2008, through December 31, 2020. drews (1991). The initial estimation period ranges from January 2,
2000, through June 6, 2008, and the out-of-sample period from June 7,
2008, through December 31, 2020.
other words, significance of 𝑏𝜏,ℎ2
shows in case (i) that the term structure
models do not capture all relevant information about future volatility volatility beyond that captured by the affine models, especially at the
available from the yield curve, and in case (ii) that the yield curve itself long end.
does not capture all relevant information. In (iii), significance indicates For the time series models, the evidence suggests that the yield curve
that historical volatility contains incremental information not captured contains incremental information about future volatility at the long end,
by the term structure models. relative to that contained in the historical volatility series. The null is
In addition, we subject the time series models to the same tests, i.e., rejected at 1% for 𝜏 = 7 for both the HAR and realized GARCH models,
we examine whether the similar information variable 𝑋̃ 𝑡 can be used to and for the latter at 10% for the other maturities, too.
improve the volatility forecasts from the time series models. If so, this Taken together, the results indicate that the yield curve contains
indicates in (i) that the yield curve contains incremental information important incremental information about future volatility relative to
about future volatility, relative to that contained in the volatility history the time series of historical volatilities at the long end, and that some
itself, and in (iii) that the time series models do not fully exploit the of this information is not captured by standard affine models.
information in the high-frequency data.
5.2.2. Can past forecast errors predict future forecast errors?
5.2.1. The incremental information in the yield curve In (ii), we examine whether a factor extracted from past yield
For (i), we regress volatility forecast errors from model 𝜉 on PCA volatility forecast errors can predict future forecast errors. An impli-
factors from the yield curve, cation of USV is that it should be possible to extract at least one factor
from volatility which is not related to the yields. We investigate this
𝜏
𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝜉
𝜏
= 𝜙𝜏,ℎ
0
+ 𝜙𝜏,ℎ′
1
𝐹̂𝑡 + 𝑢𝜏,ℎ
𝑡+ℎ
, (22) possibility by recursively extracting a factor, say, 𝑃 𝐶𝑡,𝜉ℎ , from the cross
𝜏 − 𝑉̂ 𝜏′
′ ′
for fixed 𝜏, ℎ, with 𝐹̂𝑡 the three leading fitted PCA factors at time 𝑡 from section of current and past forecast errors 𝑉 ′ ′𝑡 +1→𝑡 +ℎ
by applying
𝑡 ,ℎ,𝜉
Eq. (8), based on data through 𝑡, and test for joint significance of 𝜙𝜏,ℎ PCA to forecast errors at 𝑡′ = 𝑡 − ℎ and earlier, across maturities 𝜏 ′ , using
1
(𝑘 = 3 coefficients). Under the null, the model generating the forecast PCA, then testing for whether the extracted factor contains significant
𝑉̂𝑡,ℎ,𝜉
𝜏 subsumes the information content on future volatility available in information about future forecast errors in the regression
the PCA yield curve factors. 𝜏
𝑉𝑡+1→𝑡+ℎ − 𝑉̂𝑡,ℎ,𝜉
𝜏
= 𝜙𝜏,ℎ + 𝜙𝜏,ℎ′ 𝐹̂𝑡 + 𝜓1𝜏,ℎ 𝑃 𝐶𝑡,𝜉
ℎ
+ 𝑢𝜏,ℎ
𝑡+ℎ
, (23)
Table 5 presents results from the specification test. In general, the 0 1
term structure models do not capture all relevant information in the for fixed 𝜏, ℎ. Only one factor is included, due to the small number of
PCA yield curve factors about future volatility. The null is rejected at volatilities in the cross section. The first 100 observations are used to
level 10% or better at two or more maturities for each term structure initialize the factor. A significant coefficient 𝜓1𝜏,ℎ indicates that a serially
model, and for two or more models at each maturity. It is rejected at dependent USV factor is relevant for the forecast.
5% or better for all models at the longest maturity, and at 10% or bet- Table 6 shows results from estimation of Eq. (23). At 5%, evidence
ter at 𝜏 = 5, except that here, AFNS0 captures the information in the of an omitted factor arises at two or more maturities for each term
curve about future volatility (𝑝-value 0.86), consistent with the rela- structure model, and for two or more models at each maturity. The co-
tively strong performance of this model at the intermediate maturity in efficients 𝜓1𝜏,ℎ are negative in all cases, consistent with mean-reversion
Table 4. However, the PCA factors carry incremental information rela- in the USV factor. The test is conservative, in that the specification in
tive to AFNS0 at level 5% for all other maturities. Thus, the indication Eq. (23) controls for PCA factors fitted to past yield curves, so the re-
is that the yield curve contains incremental information about future sults are indicative of the existence of a latent volatility factor which is
11
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
12
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
Table 8 Table 9
Utility from yield volatility forecasts. Yield volatility forecasts using nonlinearities in yield curve factors.
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7 𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
13
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
volatility. Further, if volatility forecasts are indeed improved by using (0.88) (0.88) (0.01) (0.87)
some other combination of factors, the question arises whether the op- PCA234 -27.48 -20.92 18.82 -12.21
timal combination is common across maturities. (0.90) (0.90) (0.02) (0.86)
To address these issues, we consider a regression of the type in PCA156 -24.31 -17.03 6.14 -36.16
Eq. (1), with 𝑍𝑡 representing a combination of three of the first six PCA (0.90) (0.88) (0.35) (0.92)
factors, i.e., not necessarily the first three. We consider all 20 possible PCA146 -24.20 -18.35 6.41 -26.83
combinations. (0.91) (0.91) (0.32) (0.93)
Table 10 presents the 𝑅2𝑂𝑜𝑆 results. Strongest overall forecasting per- PCA145 -27.88 -20.19 6.59 -34.79
formance is obtained at the intermediate maturity 𝜏 = 5 years. This (0.89) (0.91) (0.34) (0.97)
echoes the result from Table 4, where forecasting based on the first PCA136 -29.18 -24.92 17.29 -0.53
three factors, 𝐹̂𝑡 = PCA123 , say, improves significantly over the RW at (0.93) (0.95) (0.08) (0.51)
level 5% for 𝜏 = 5. From Table 10, nine of the 20 factor combinations
PCA135 -31.09 -23.02 16.80 -5.82
improve significantly over the RW at 𝜏 = 5, and of these, the first three
(0.91) (0.90) (0.11) (0.64)
factors (bottom row of table) generate the lowest 𝑅2𝑂𝑜𝑆 statistic, along
PCA134 -31.68 -23.86 17.38 -2.42
with PCA126 . The best combination (highest statistic) is PCA245 , i.e.,
(0.92) (0.91) (0.08) (0.58)
combining the second, fourth, and fifth PCA factors from Eq. (8) for
PCA126 -24.15 -19.05 15.91 -3.71
purposes of yield volatility forecasting. The second factor, slope, enters
all of the nine significant combinations, and so proves important for (0.90) (0.90) (0.04) (0.61)
14
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
15
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
Table 12 Overall, our results show that either information in the yield curve
Time series extensions. or in the volatility time series can be used to improve yield volatility
forecasting. The question arises whether forecasting performance can
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
be further enhanced by combining the two sources of information. We
Panel A: HAR extensions address this issue in the HAR framework via a time-varying parameter
extension. In particular, we allow for the HAR coefficients to depend on
Log-HAR 4.05 6.69 17.08 14.38
information from the yield curve. This notion seems natural, as interest
(0.39) (0.30) (0.02) (0.11)
rates are known to carry important economic signals. Thus, the yield
Lev-HAR -5.40 -0.64 22.01 14.35
spread is widely considered to be a recession indicator, and the inter-
(0.61) (0.52) (0.00) (0.12)
est rate level measures the distance to the ZLB. Furthermore, we have
Panel B: Interest rate HAR extensions shown that interest rate factors can forecast the volatility forecasting er-
rors from the HAR model (Section 5.2.1). This suggests that interest rate
PC1 -0.15 6.88 21.50 12.80
information can be incorporated in the HAR framework in a productive
(0.50) (0.31) (0.01) (0.09)
manner. For a simple specification, we consider in turn whether each of
PC2 0.29 7.08 25.31 11.58
the three first PCA factors from the yield curve drives variation in the
(0.49) (0.31) (0.00) (0.07) intercept in the HAR model, or in both intercept and slopes, for an inter-
PC3 -0.86 5.33 24.66 18.04 active effect between yield curve and volatility information. Here, the
(0.52) (0.35) (0.00) (0.02) first PCA factor, level, should capture distance to the ZLB, and the next
PC1 interaction 0.12 9.35 21.93 12.84 two, slope and curvature, the effect of the spread. Thus, in the general-
(0.50) (0.23) (0.00) (0.07) ized models, when Eq. (10) is estimated based on information through
PC2 interaction -4.09 10.03 26.54 11.96 𝑡′ , the extended parameter specification
(0.62) (0.22) (0.00) (0.05)
𝜏,ℎ
PC3 interaction -10.13 0.29 23.86 18.91 𝛽𝑖,𝑡 = 𝛼𝑖𝜏,ℎ + 𝛾𝑖𝜏,ℎ 𝐹̂𝑡,𝑗 (36)
(0.69) (0.49) (0.00) (0.01)
is used, with 𝐹̂𝑡,𝑗 , 𝑗 = 1, 2, 3 the fitted yield curve factors at 𝑡 ≤ 𝑡′ from es-
This table displays 𝑅2𝑂𝑜𝑆
measures in percent relative to timation of Eq. (8) with 𝑘 = 3, using data through 𝑡′ . In the time-varying
a RW for the Log-HAR model, the Lev-HAR model, and intercept extensions, Eq. (36) is only used for 𝑖 = 0. In the interactive
the HAR model with time-varying parameters, with ei- extensions, it is used for 𝑖 = 0, 𝐷, 𝑊 , 𝑀 .
ther intercept or both intercept and slopes depending in Other approaches to time-varying parameters in the HAR frame-
an affine manner on one of the three leading principal work have been proposed in the literature, e.g., the HARQ model of
components from the daily yield panel. In parentheses Bollerslev et al. (2016), and the model of Buccheri and Corsi (2021).
asymptotic 𝑝-values for a one-sided Diebold-Mariano test
However, these models rely on the basic realized volatility framework
using the Newey-West variance estimator with automatic
and, hence, are not naturally transferred to our pre-averaging estima-
lag selection of Andrews (1991). The initial estimation pe-
riod ranges from January 2, 2000 to June 6, 2008, and the
tor setting. Therefore, we do not pursue these alternative approaches
out-of-sample period from June 7, 2008, through Decem- further.
ber 31, 2020. Panel B in Table 12 shows the 𝑅2𝑂𝑜𝑆 results from using Eq. (36)
with Eq. (10). At the longest maturities, extending the HAR model with
is replaced by its logarithm, for estimation purposes, and a Jensen’s information from the yield curve enhances volatility forecasting per-
inequality correction applied when constructing level forecasts (expo- formance, relative to the standard model from Table 4. Allowing for
nentiating the relevant fitted value plus one half times the residual interactions using level or slope, performance is improved at 𝜏 = 1, too.
variance). We label this the Log-HAR model. Second, stocks are affected For given maturity, the best volatility forecasts considered are those us-
by the so-called leverage effect. We examine whether this is also the ing the original HAR at the short end, the interactive HAR extension
case for interest rates and, in particular, whether it matters for yield with slope at the two intermediate maturities, and that with curvature
volatility forecasting, by considering the leverage HAR (or Lev-HAR) at the long end of the curve. The results suggest that recession risk is im-
model of Corsi and Renò (2012). The model allows for a leverage effect portant at longer maturities. All extensions retain significance at level
by extending Eq. (10) with the sum of past negative returns. It is given 1% of the improvement over the RW at 𝜏 = 5 and, in contrast to the
as standard HAR, deliver significant improvements at the longest matu-
rity, too, at 10%. The time-varying intercept model using curvature and
𝜏
𝑉𝑡+1→𝑡+ℎ = 𝛽0𝜏,ℎ + 𝛽𝐷
𝜏,ℎ 𝜏 𝜏,ℎ 𝜏
𝑉𝑡 + 𝛽𝑊 𝜏,ℎ 𝜏
𝑉𝑡−4→𝑡 + 𝛽𝑀 𝑉𝑡−21→𝑡 the interactive model using slope provide significant improvements at
𝜏,ℎ − 𝜏,ℎ − 𝜏,ℎ −
5%, and the interactive model using curvature at 1%. Obviously, fur-
+ 𝛾𝐷 |𝑟𝑡,𝜏,1 | + 𝛾𝑊 |𝑟𝑡,𝜏,5 | + 𝛾𝑀 |𝑟𝑡,𝜏,22 | + 𝑢𝜏,ℎ
𝑡+ℎ
, (35) ther research could investigate combinations using multiple factors, but
where 𝑟− aggregates any negative changes in the maturity 𝜏 yield the results confirm that cross-sectional yield curve information and his-
𝑡,𝜏,ℎ
over the past ℎ days. torical time series information on volatility can be fruitfully combined
Panel A in Table 12 presents the 𝑅2𝑂𝑜𝑆 results. Comparing with the for volatility forecasting purposes.
standard HAR model in Table 4, each extension generates higher 𝑅2𝑂𝑜𝑆
at two maturities (𝜏 = 0.5 and 7 for Log-HAR, 𝜏 = 5 and 7 for Lev-HAR), 7. Concluding remarks
and lower than HAR at the other two. All three models improve signif-
icantly over the RW at maturity 𝜏 = 5 and level 5%, and get 𝑝-values Our results show that the assessment of future interest rate risk is a
around 11% for the longest maturity. Thus, the evidence on the ex- complicated affair. A strong benchmark can be formed by basing fore-
tensions is mixed, and the standard HAR model does not appear to be a casts on historical volatility, with a focus on recent periods, as in the
bad choice. Regarding the leverage effect, a potential explanation is that formal time-varying volatility models. A good alternative at interme-
compared to returns, yields tend to have a truncation around zero, as diate maturities can be based on the risk premium, in particular, the
in shadow rate models (Black, 1995; Christensen and Rudebusch, 2015; forward spread, or a common factor approach. For agents with a pri-
Andreasen and Meldrum, 2019). The truncation lowers yield volatility mary interest in the long end of the curve, e.g., for immunization of
for yields near zero by construction. Thus, we do not necessarily expect assets and liabilities, valuation of capital assets, or market timing, in-
a leverage effect in interest rates. cremental information on future yield volatility relative to that in the
16
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
time series is available by looking across maturities along the curve, ⎧ 𝑧 if 𝑧 < 7
⎪
and more so than implied by formal term structure models. Some of 𝑣=⎨ 3 if 𝑧 ≥ 7 for 10-year note, long term bond
the curve-based information about future volatility is distinct from that ⎪𝑧 − 6 if 𝑧 ≥ 7 for 5-year note
⎩
explaining the levels of yields, and is contained in higher-order factors
rather than nonlinearities. Furthermore, our results point to the exis- with 𝑐𝑜𝑢𝑝𝑜𝑛 the annualized coupon, 𝑛 the number of (whole) years from
tence of a latent stochastic volatility factor, unspanned by yields. the first day of the delivery month to maturity, and 𝑧 the number of
Based on a simple portfolio exercise, the information about future months between 𝑛 and the maturity date, rounded to nearest quarter for
reinvestment rate risk in either the yield curve or the volatility history the 10-year note and the long term Treasury bond, and nearest month
is of economic value to a risk averse investor, both in short and long for the 5-year note. The delivery bond (or note) must have a maturity
term instruments. A simple yield spread, affine models, and time series between 4.17 and 5.25 years for the 5-year futures, between 6.5 and 10
methods are all useful, on utility grounds, relative to a random walk. years for the 10-year futures, and between 15 and 25 years for the long
Finally, we demonstrate that the cross-sectional yield curve information term Treasury bond futures.
and the historical time series information on volatility can be fruitfully Given the conversion factor 𝑓𝑡 from Eq. (A.1), the invoice amount is
combined to enhance volatility forecasting performance.
𝐼𝑡 = 𝑓𝑡 ⋅ 𝐹𝑡 + 𝑎𝑡 , (A.2)
We have emphasized from the beginning that when leveraging
volatility information drawn from the yield curve, investor should not with 𝐹𝑡 the futures price, and 𝑎𝑡 accrued interest. On delivery, the seller
ignore historical volatility in the information set. That the resulting pic- pays the basis
ture is diverse is perhaps not surprising. Investor is, after all, gazing
into the crystal ball. Clearly, mimicking this situation requires a recur- 𝜋𝑡 = 𝑆𝑡 − 𝐼𝑡 , (A.3)
sive, out-of-sample approach, to avoid admitting the artificial investor
with 𝑆𝑡 the cash bond price (including accrued interest), and therefore
the benefit of hindsight.
selects the cheapest-to-deliver (CTD) bond fulfilling the futures contract
specifications and minimizing Eq. (A.3) (maximizing the implied repo
CRediT authorship contribution statement
rate).
The second option regards the delivery date. For the purpose of
All authors have contributed equally to all parts of the article. No
backing out the futures-implied bond price, we assume that both the
other authors have been involved.
CTD bond and delivery date are known. We set the delivery date to the
first working day of the delivery month, and use the CRSP data on Trea-
Declaration of competing interest suries to construct the delivery basket. The price of the futures is then
given as
None.
𝐼𝑡 = (𝑆𝑡 − 𝐶𝑡 )𝑒𝑟𝑡 𝑇 , (A.4)
Data availability
with 𝑆𝑡 the spot price of the CTD bond (including accrued interest), 𝐶𝑡
Data will be made available on request. the present value of coupon payments before delivery, 𝑇 the time to
delivery, and 𝑟𝑡 the risk-free rate. For the latter, we use the 3-month
1∕4
Appendix A. From futures prices to yield curves yield from the daily panel, 𝑟𝑡 = 𝑦𝑡 (Section 3.5). The 3-month rate
should be more market-based than, say, a 1-month rate, and a good
A.1. From futures prices to coupon bond prices short rate proxy, cf. Chapman et al. (1999).
In brief, the recipe for constructing the futures-implied bond price
𝑆𝑡 is as follows:
The seller (the short) has two options included in the Treasury fu-
tures. The first regards which bond to deliver, and the second the
(i) Construct the delivery basket from the CRSP data on the first trad-
delivery date within the delivery month. For the first option, the un-
ing day of the delivery month.
derlying assets of the Treasury futures (see Section 3) are hypothetical
(ii) Find the cheapest-to-deliver bond minimizing the basis 𝜋𝑡 from
bonds with a notional yield of 6% throughout our sample period. The
Eq. (A.3) within the delivery basket from (i).
seller must deliver an actual bond, selected from a delivery basket con-
(iii) Use Eq. (A.2) to calculate the invoice amount 𝐼𝑡 for the CTD bond
structed for each futures contract according to CME requirements, with
from (ii).
a conversion factor for each bond. The conversion factor converts the
(iv) Substitute the invoice amount from (iii) for 𝐼𝑡 in Eq. (A.4), and
bond price into “. . . the approximately decimal price at which $1 par
back out the futures-implied bond price by isolating 𝑆𝑡 .
of the security would trade as if it had a 6% yield-to-maturity.”16 The
formula is
A.2. From coupon bond prices to yield curves
( 𝑐𝑜𝑢𝑝𝑜𝑛 )
𝑓 =𝑎 +𝑐+𝑑 −𝑏 (A.1)
2 A variety of methods is available for extracting the yield curve from
( )𝑣
1 6 the cross section of coupon bond prices. In the literature on yield curves
𝑎=
1.03 at high frequency, some version of cubic spline is usually employed,
𝑐𝑜𝑢𝑝𝑜𝑛 6 − 𝑣 e.g., Andersen and Benzoni (2010) and Cieslak and Povala (2016), due
𝑏=
2 6 to the small number of observations in the cross section, and the need
⎧ ( 1 )2𝑛 for flexibility to fit the curve reasonably. As a consequence, interpola-
⎪ if 𝑧 < 7
𝑐 = ⎨ ( 1.03)2𝑛+1 tion between points of principal payments is difficult. Following Cieslak
⎪ 1
otherwise and Povala (2016), we use the method of Fisher et al. (1995). This as-
⎩ 1.03
sumes that the discount function 𝐵𝑡𝜏 = exp(−𝜏𝑦𝜏𝑡 ) is determined by a
𝑐𝑜𝑢𝑝𝑜𝑛
𝑑= (1 − 𝑐) cubic spline ℎ( ⋅ , Ψ), i.e., 𝜏𝑦𝜏𝑡 = ℎ(𝜏, Ψ), with Ψ a set of parameters (sup-
0.06
pressing dependence on 𝑡). The yield curve is split along the maturity
axis using 𝐾 knot points, 0 < 𝜏1 < … < 𝜏𝐾 , with 𝜏𝐾 the maximum ma-
16 https://www.cmegroup.com/trading/interest-rates/calculating-us- turity of the bonds considered. The cubic spline is restricted such that
treasury-futures-conversion-factors.html. ℎ( ⋅ , Ψ) and its two first derivatives are continuous. This implies that one
17
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
with 𝐽 the number of payments (coupons and principal) over the re- The model of Vasicek (1977) is an 𝐴0 (1) model in which the dynam-
maining lifetime of the bond, and 𝐶𝑗 the 𝑗 th payment, due 𝑡𝑗 periods ics of the short rate 𝑟𝑡 = 𝑦0𝑡 are given by
hence. The yield curve is estimated using penalized nonlinear least
𝑑𝑟𝑡 = 𝜅(𝜃 − 𝑟𝑡 )𝑑𝑡 + 𝜎𝑑𝑊𝑡 .
squares (PNLS),
𝜏𝐾 We assume that the market price of risk is completely affine, i.e., 𝜆𝑡 = 𝜆.
𝐾
∑
̂ = arg min Thus,
Ψ (𝑃𝑖 − 𝑃̂𝑖 (Ψ))2 + 𝜆 ℎ (𝑠) 𝑑𝑠 ,
′′ 2
Ψ ∫
𝑖=1 𝜎𝜆
0 𝜃̃ = 𝜃 − , 𝜅̃ = 𝜅 .
𝜅
where the penalty term with coefficient 𝜆 induces smoothness of the
calibrated yield curve. We find that this procedure results in a better The solution to the Riccati equations (6) is given by
fit to bond prices than the Nelson and Siegel (1987) approach used in 1
Faust et al. (2007). 𝐵(𝜏) = (1 − 𝑒−𝜅𝜏̃
),
𝜅̃
Upon estimation, four points are read off the resulting high- ( )
𝜎 2 𝜎2
frequency (1 minute) yield curves, at 𝜏 = 0.5, 1, 5, and 7 years. 𝐴(𝜏) = 𝜃̃ − (𝜏 − 𝐵(𝜏)) + 𝐵(𝜏)2 .
2𝜅̃ 2 4𝜅̃
Appendix B. Liquidity of Treasury futures The conditional variance of the future short rate is
𝜎2
𝑉 𝑎𝑟𝑡 (𝑟𝑡+ℎ ) = (1 − 𝑒−2𝜅ℎ ) .
Table A.2 shows the percentage of high-frequency intervals contain- 2𝜅
ing a Treasury futures price observation, and hence affording unique
yield identification, at the 1, 5, and 10 minute sampling frequencies, C.2. The Cox-Ingersoll-Ross model
by maturity. All three maturities, 5 and 10 years, and long term, are
included in the table, because all three Treasury futures are used in The model of Cox et al. (1985) is an 𝐴1 (1) model with
curve fitting, although only implied yields at maturities up to 𝜏 = 7 √
are used in the subsequent analysis (see Section 3.3 and Appendix A). 𝑑𝑟𝑡 = 𝜅(𝜃 − 𝑟𝑡 )𝑑𝑡 + 𝜎 𝑟𝑡 𝑑𝑊𝑡 .
From the table, around 86% to 95% of the 1 minute intervals con- We adopt the completely affine market price of risk specification, 𝜆𝑡 =
tain an observation. This increases to around 97% and 99% for 5 and √
𝜆 𝑟𝑡 ∕𝜎. Thus,
10 minute intervals, respectively. The lower panel of the table shows
𝜅𝜃
the corresponding percentages when discarding trading days with no 𝜅̃ = 𝜅 + 𝜆, 𝜃̃ = .
trading activity for more than an hour. All numbers increase by around 𝜅 +𝜆
0.5-1.0%. This suggests that the discarded days do not contain many The solution to the Riccati equations (6) is given by
uniquely identified observations and, hence, it makes sense to discard 2(𝑒𝛾𝜏 − 1)
them. Fig. A.1 shows the evolution over time in the daily percentage 𝐵(𝜏) = ,
̃ 𝛾𝜏 − 1) + 2𝛾
(𝛾 + 𝜅)(𝑒
18
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
[ ]
2𝜅̃ 𝜃̃ 1
𝐴(𝜏) = − ̃ 𝛾𝜏 − 1) + 2𝛾) ,
log(2𝛾) + (𝜅̃ + 𝛾) − log ((𝛾 + 𝜅)(𝑒 Nelson and Siegel (1987) parametrization by assuming that the market
𝜎 2 2
price of risk is essentially affine and restricting
with 𝛾 given by
√
𝛾= 𝜅̃ 2 + 2𝜎 2 . ⎛0 0 0⎞
The conditional variance of the short rate is 𝜅̃ = ⎜ 0 −𝜆 0⎟ ,
⎜ ⎟
⎝0 𝜆 𝜆⎠
𝜎 2 𝑟𝑡 −𝜅ℎ 𝜎2𝜃
𝑉 𝑎𝑟𝑡 (𝑟𝑡+ℎ ) = (𝑒 − 𝑒−2𝜅ℎ ) + (1 − 𝑒−𝜅ℎ ) .
𝜅 2𝜅
𝜃̃ = 03×1 .
C.3. The arbitrage-free Nelson-Siegel model with deterministic volatility
The yield curve is then
19
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
Table A.3
Out-of-sample 𝑅2 for two months ahead yield volatility forecasting.
𝜏 = 0.5 𝜏 =1 𝜏 =5 𝜏 =7
𝑋𝑡 = 𝐶Δ + 𝐷Δ
′
𝑋𝑡−1 + 𝜂𝑡 , (D.7)
C.4. The arbitrage-free Nelson-Siegel model with stochastic volatility
where
( ) [( ) ( )]
The AFNS3 stochastic (or state-dependent) volatility model of Chris- 𝜀𝑡 0 𝐻Δ 0
∼N , .
tensen et al. (2010) is an 𝐴3 (3) model. Under ℚ, the dynamics of the 𝜂𝑡 0 0 𝑄𝑡,Δ
state variables are restricted to
The conditional expectation and variance of the state process are given
⎛𝜀 0 0 ⎞ ⎡⎛ 𝜃̃1 ⎞ ⎤ by
𝑑𝑋𝑡 = ⎜ 0 𝜆 −𝜆 ⎟ ⎢⎜ 𝜃̃2 ⎟ − 𝑋𝑡 ⎥ 𝑑𝑡
⎜ ⎟ ⎢⎜ ⎟ ⎥
𝜆 ⎠ ⎣⎝ 𝜃̃3 ⎠ 𝔼𝑡−1 (𝑋𝑡 ) =(𝐼 − exp(−𝜅Δ))𝜃 + exp(−𝜅Δ)𝑋𝑡−1 = 𝐶Δ + 𝐷Δ 𝑋𝑡−1 ,
′
⎝0 0 ⎦ (D.8)
√ 𝑡
⎛ 𝜎1,1 0 0 ⎞ ⎛ 𝑋1,𝑡
√
0 0 ⎞
+⎜ 0 𝜎2,2 0 ⎟⎜ 0 𝑋2,𝑡 0 ⎟ 𝑑𝑊𝑡ℚ . 𝑉 𝑎𝑟𝑡−1 (𝑋𝑡 ) = exp(−𝜅(𝑡 − 𝑢))𝜎(𝔼𝑡−1 (𝑋𝑢 ))𝜎(𝔼𝑡−1 (𝑋𝑢 ))′
⎜ ⎟⎜ √ ⎟ ∫
⎝ 0 0 𝜎3,3 ⎠ ⎝ 0 0 𝑋3,𝑡 ⎠ 𝑡−1
The Riccati equations (6) are then given by exp(−𝜅(𝑡 − 𝑢))′ 𝑑𝑢 = 𝑄𝑡,Δ , (D.9)
20
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
√
where exp(⋅) is the matrix exponential, and 𝜎(⋅) = Σ 𝑆(⋅), with 𝑆(⋅) information set, thereby conditioning on a larger information set than
√
from Eq. (3). For example, 𝜎(𝑋𝑢 ) = diag(𝜎𝑖,𝑖 𝑋𝑖,𝑢 ) in the model in that based on daily yields in the Kalman filter.
Appendix C.4. By Eq. (D.8), the transition parameters in Eqs. (19)
and (D.7) are 𝐶Δ = (𝐼𝑑 − exp(−𝜅Δ))𝜃 and 𝐷Δ = exp(−𝜅Δ). Write 𝑌𝑡 = D.2. Conditional moments of state process
(𝑦1 , … , 𝑦𝑡 ) for the data through 𝑡 and define 𝑋𝑡∣𝑡−1 = 𝔼(𝑋𝑡 ∣ 𝑌𝑡−1 ), Σ𝑡∣𝑡−1 =
𝑉 𝑎𝑟(𝑋𝑡 ∣ 𝑌𝑡−1 ), 𝑋𝑡∣𝑡 = 𝔼(𝑋𝑡 ∣ 𝑌𝑡 ), and Σ𝑡∣𝑡 = 𝑉 𝑎𝑟(𝑋𝑡 ∣ 𝑌𝑡 ). Suppressing Δ for By the results in Fackler (2000), also used in Jacobs and Karoui
notational ease, the prediction step is then (2009), the first two conditional moments can be written as
with the one step ahead prediction error and its variance given by ( ) [ ]
𝐷ℎ 𝐼
= exp(−Πℎ) 𝑑 . (D.21)
𝜁1,ℎ
′
0
𝑣𝑡 = 𝑦𝑡 − 𝐴̃ − 𝐵̃ ′ 𝑋𝑡∣𝑡−1 , (D.14)
The states 𝑋𝑡 are Gaussian for the Vasicek and AFNS0 models. For the a (𝑑 + 𝑑 2 ) × (𝑑 + 𝑑 2 ) matrix, with the 𝑖th row of B given by 𝛽𝑖′ from Eq. (3),
square root processes (the CIR and AFNS3 models), estimation based on and rank B = 𝑚 ≤ 𝑑.
Eq. (D.16) amounts to QML. Here, the Gaussian approximation implies In the special case 𝑚 = 0, i.e., an 𝐴0 (𝑑) model, we have B = 0 and,
that states are not restricted to be positive. In this case, following Chen hence,
[ ]
and Scott (2003), we truncate states at 0. Given the large number of 𝜅 0
Π= .
parameters in some of the models, we use the global optimizer differen- 0 𝜅 ⊗ 𝐼𝑑 + 𝐼𝑑 ⊗ 𝜅
tial evolution with several starting values. For every 250 observations,
From Eq. (D.21),
we reestimate parameters using differential evolution. In intermediate
( ) ( )
periods we use local optimizers. 𝐷ℎ exp(−𝜅ℎ)
= (D.22)
𝜁1,ℎ
′
0
D.1. Predictive regression correction
in this case. In particular,
From Eq. (D.6), 𝑣𝑎𝑟(𝑦𝑡 ∣ 𝑋𝑡−1 ) = 𝐵̃ ′ 𝑉 𝑎𝑟𝑡−1 (𝑋𝑡 )𝐵̃ + 𝐻 , suppressing de-
pendence on the time increment Δ. Using this and Eqs. (D.9), (D.11) 𝜁1,ℎ = 0 , for 𝑚 = 0 . (D.23)
and (D.15),
D.3. Conditional moments of yields
𝑉 𝑎𝑟(𝑦𝑡 ∣ 𝑌𝑡−1 ) = 𝑉 𝑎𝑟(𝑣𝑡 )
By Eqs. (5) and (D.18), the conditional means of the yields are
= 𝐵̃ ′ (𝐷′ Σ𝑡−1∣𝑡−1 𝐷 + 𝑉 𝑎𝑟𝑡−1 (𝑋𝑡 ))𝐵̃ + 𝐻
( )
= 𝐵̃ ′ 𝐷′ Σ𝑡−1∣𝑡−1 𝐷𝐵̃ + 𝑉 𝑎𝑟(𝑦𝑡 ∣ 𝑋𝑡−1 ) . (D.17) 𝔼𝑡 (𝑦𝑡+ℎ ) = 𝐴̃ + 𝐵̃ ′ 𝐶ℎ + 𝐷ℎ′ 𝑋𝑡 . (D.24)
Thus, the conditional variance of yields is given by that corresponding Using Eqs. (7) and (D.19), the conditional yield variances are
to perfect observation of state variables through 𝑡 − 1, i.e., the second 1
term in Eq. (D.17), with an adjustment for conditional mean variation 𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) = 𝐵(𝜏)′ 𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ )𝐵(𝜏)
𝜏2
due to imperfect state observation given by the first term in Eq. (D.17) 1
(see footnote 3). When forecasting using the predictive regression in = (𝐵(𝜏) ⊗ 𝐵(𝜏))′ 𝑣𝑒𝑐(𝑉 𝑎𝑟𝑡 (𝑋𝑡+ℎ ))
𝜏2
Eq. (1), with 𝑍𝑡 given by Eq. (7), then 𝑍𝑡 corresponds to the second
= 𝑏𝜏,ℎ + 𝑏𝜏,ℎ
′
term in Eq. (D.17). Forecasting therefore involves three modifications. 0 1
𝑋𝑡 , (D.25)
First, the filtered state 𝑋𝑡∣𝑡 is used in place of 𝑋𝑡 in Eq. (7). Second, 1
𝑏𝜏,ℎ = (𝐵(𝜏) ⊗ 𝐵(𝜏))′ 𝜁0,ℎ , (D.26)
using Eq. (1), 𝜌𝜏,ℎ and 𝜌𝜏,ℎ provide empirical predictive regression cor-
0 𝜏2
0 1
𝜏,ℎ′ 1
rections for the bias stemming from conditional mean variation, i.e., 𝑏1 = 2 (𝐵(𝜏) ⊗ 𝐵(𝜏))′ 𝜁1,ℎ
′
, (D.27)
the first term in Eq. (D.17). Finally, the estimated coefficients in Eq. (1) 𝜏
reflect the history of realized volatilities, which are part of investor’s where 𝑏𝜏,ℎ
0
is a scalar, and 𝑏𝜏,ℎ
1
a 𝑑 × 1 vector.
21
B.J. Christensen, M.M. Kjær and B. Veliyev Journal of Banking and Finance 155 (2023) 106973
In the special case 𝑚 = 0, i.e., an 𝐴0 (𝑑) (Gaussian) model, such as Corsi, F., 2009. A simple approximate long-memory model of realized volatility. J. Financ.
Vasicek or AFNS0 , we have from Eq. (D.23) that 𝜁1,ℎ = 0. Thus, by Econom. 7 (2), 174–196.
Corsi, F., Renò, R., 2012. Discrete-time volatility forecasting with persistent leverage ef-
Eq. (D.27), it follows that 𝑏𝜏,ℎ′
1
= 0 and, by Eq. (D.25), fect and the link with continuous-time volatility modeling. J. Bus. Econ. Stat. 30 (3),
368–380.
𝑉 𝑎𝑟𝑡 (𝑦𝜏𝑡+ℎ ) = 𝑏𝜏,ℎ
0
, for 𝑚 = 0 . (D.28) Cox, J.C., Ingersoll Jr, J.E., Ross, S.A., 1985. A theory of the term structure of interest
rates. Econometrica 53 (2), 385–408.
References Dai, Q., Singleton, K.J., 2000. Specification analysis of affine term structure models. J.
Finance 55 (5), 1943–1978.
Dai, Q., Singleton, K.J., Yang, W., 2007. Regime shifts in a dynamic term structure model
Andersen, T.G., Benzoni, L., 2010. Do bonds span volatility risk in the US Treasury mar-
of US Treasury bond yields. Rev. Financ. Stud. 20 (5), 1669–1706.
ket? A specification test for affine term structure models. J. Finance 65 (2), 603–653.
Diebold, F.X., Mariano, R.S., 1995. Comparing predictive accuracy. J. Bus. Econ. Stat. 13,
Andersen, T.G., Bollerslev, T., Christoffersen, P.F., Diebold, F.X., 2006. Volatility and
253–263.
correlation forecasting. In: Elliot, G., Granger, C., Timmermann, A. (Eds.), Handbook
Duffee, G.R., 2002. Term premia and interest rate forecasts in affine models. J. Finance 57
of Economic Forecasting. North-Holland, Amsterdam, pp. 778–878.
(1), 405–443.
Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting
Duffie, D., Kan, R., 1996. A yield-factor model of interest rates. Math. Finance 6 (4),
realized volatility. Econometrica 71 (2), 579–625.
379–406.
Andreasen, M.M., Christensen, B.J., 2015. The SR approach: a new estimation procedure
Dumas, B., Fleming, J., Whaley, R.E., 1998. Implied volatility functions: empirical tests.
for non-linear and non-Gaussian dynamic term structure models. J. Econom. 184 (2),
J. Finance 53 (6), 2059–2106.
420–451.
Fackler, P., 2000. Moments of affine diffusions. Working paper. North Carolina State
Andreasen, M.M., Meldrum, A., 2019. A shadow rate or a quadratic policy rule? The best
University.
way to enforce the zero lower bound in the United States. J. Financ. Quant. Anal. 54
Fama, E.F., Bliss, R.R., 1987. The information in long-maturity forward rates. Am. Econ.
(5), 2261–2292.
Rev. 77 (4), 680–692.
Andrews, D.W., 1991. Heteroskedasticity and autocorrelation consistent covariance ma-
Faust, J., Rogers, J.H., Wang, S.-Y.B., Wright, J.H., 2007. The high-frequency response
trix estimation. Econometrica 59 (3), 817–858.
of exchange rates and interest rates to macroeconomic announcements. J. Monet.
Bai, J., Ng, S., 2002. Determining the number of factors in approximate factor models.
Econ. 54 (4), 1051–1068.
Econometrica 70 (1), 191–221.
Feldhütter, P., Heyerdahl-Larsen, C., Illeditsch, P., 2016. Risk premia and volatilities in a
Bakshi, G., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing
nonlinear term structure model. Rev. Finance 22 (1), 337–380.
models. J. Finance 52 (5), 2003–2049.
Filipović, D., Larsson, M., Trolle, A.B., 2017. Linear-rational term structure models. J.
Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008. Designing realized
Finance 72 (2), 655–704.
kernels to measure the ex post variation of equity prices in the presence of noise.
Fisher, M., Nychka, D.W., Zervos, D., 1995. Fitting the term structure of interest rates
Econometrica 76 (6), 1481–1536.
with smoothing splines. Working paper. Federal Reserve Board.
Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realized volatility
Gargano, A., Pettenuzzo, D., Timmermann, A., 2017. Bond return predictability: economic
and its use in estimating stochastic volatility models. J. R. Stat. Soc., Ser. B, Stat.
value and links to the macroeconomy. Manag. Sci. 65 (2), 508–540.
Methodol. 64 (2), 253–280.
Gürkaynak, R.S., Sack, B., Wright, J.H., 2007. The US Treasury yield curve: 1961 to the
Bikbov, R., Chernov, M., 2009. Unspanned stochastic volatility in affine models: evidence
present. J. Monet. Econ. 54 (8), 2291–2304.
from Eurodollar futures and options. Manag. Sci. 55 (8), 1292–1305.
Hansen, P.R., Huang, Z., Shek, H.H., 2012. Realized GARCH: a joint model for returns
Black, F., 1995. Interest rates as options. J. Finance 50 (5), 1371–1376.
and realized measures of volatility. J. Appl. Econom. 27 (6), 877–906.
Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. J. Polit.
Heath, D., Jarrow, R., Morton, A., 1992. Bond pricing and the term structure of inter-
Econ. 81 (3), 637–654.
est rates: a new methodology for contingent claims valuation. Econometrica 60 (1),
Bollerslev, T., Hood, B., Huss, J., Pedersen, L.H., 2018. Risk everywhere: modeling and
77–105.
managing volatility. Rev. Financ. Stud. 31 (7), 2729–2773.
Jacobs, K., Karoui, L., 2009. Conditional volatility in affine term-structure models: evi-
Bollerslev, T., Patton, A.J., Quaedvlieg, R., 2016. Exploiting the errors: a simple approach
dence from Treasury and swap markets. J. Financ. Econ. 91 (3), 288–318.
for improved volatility forecasting. J. Econom. 192 (1), 1–18.
Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise
Brown, S.J., Dybvig, P.H., 1986. The empirical implications of the Cox, Ingersoll, Ross
in the continuous case: the pre-averaging approach. Stoch. Process. Appl. 119 (7),
theory of the term structure of interest rates. J. Finance 41 (3), 617–630.
2249–2276.
Buccheri, G., Corsi, F., 2021. Hark the shark: realized volatility modeling with measure-
Joslin, S., 2017. Can unspanned stochastic volatility models explain the cross section of
ment errors and nonlinear dependencies. J. Financ. Econom. 19 (4), 614–649.
bond volatilities? Manag. Sci. 64 (4), 1707–1726.
Buraschi, A., Corielli, F., 2005. Risk management implications of time-inconsistency:
Joslin, S., Konchitchki, Y., 2018. Interest rate volatility, the yield curve, and the macroe-
model updating and recalibration of no-arbitrage models. J. Bank. Finance 29 (11),
conomy. J. Financ. Econ. 128 (2), 344–362.
2883–2907.
Litterman, R., Scheinkman, J., 1991. Common factors affecting bond returns. J. Fixed
Campbell, J.Y., Shiller, R.J., 1991. Yield spreads and interest rate movements: a bird’s eye
Income 1 (1), 54–61.
view. Rev. Econ. Stud. 58 (3), 495–514.
Ludvigson, S.C., Ng, S., 2009. Macro factors in bond risk premia. Rev. Financ. Stud. 22
Campbell, J.Y., Thompson, S.B., 2007. Predicting excess stock returns out of sample: can
(12), 5027–5067.
anything beat the historical average? Rev. Financ. Stud. 21 (4), 1509–1531.
Mincer, J.A., Zarnowitz, V., 1969. The evaluation of economic forecasts. In: Mincer, J.A.
Chan, K.C., Karolyi, G.A., Longstaff, F.A., Sanders, A.B., 1992. An empirical comparison
(Ed.), Economic Forecasts and Expectations: Analysis of Forecasting Behavior and
of alternative models of the short-term interest rate. J. Finance 47 (3), 1209–1227.
Performance. NBER, Cambridge, pp. 3–46.
Chapman, D.A., Long Jr, J.B., Pearson, N.D., 1999. Using proxies for the short rate: when
Nelson, C.R., Siegel, A.F., 1987. Parsimonious modeling of yield curves. J. Bus. 60,
are three months like an instant? Rev. Financ. Stud. 12 (4), 763–806.
473–489.
Chen, R.-R., Scott, L., 2003. Multi-factor Cox-Ingersoll-Ross models of the term structure:
Newey, W.K., West, K.D., 1994. Automatic lag selection in covariance matrix estimation.
estimates and tests from a Kalman filter model. J. Real Estate Finance Econ. 27 (2),
Rev. Econ. Stud. 61 (4), 631–653.
143–172.
Patton, A.J., Sheppard, K., 2015. Good volatility, bad volatility: signed jumps and the
Christensen, J.H., Diebold, F.X., Rudebusch, G.D., 2011. The affine arbitrage-free class of
persistence of volatility. Rev. Econ. Stat. 97 (3), 683–697.
Nelson–Siegel term structure models. J. Econom. 164 (1), 4–20.
Sarno, L., Schneider, P., Wagner, C., 2016. The economic value of predicting bond risk
Christensen, J.H., Lopez, J.A., Rudebusch, G.D., 2010. Can spanned term structure fac-
premia. J. Empir. Finance 37, 247–267.
tors drive stochastic yield volatility? Working paper. Federal Reserve Bank of San
Svensson, L.E., 1994. Estimating and interpreting forward interest rates: Sweden 1992-
Francisco.
1994. NBER Working Paper w4871.
Christensen, J.H., Rudebusch, G.D., 2015. Estimating shadow-rate term structure models
Vasicek, O., 1977. An equilibrium characterization of the term structure. J. Financ.
with near-zero yields. J. Financ. Econom. 13 (2), 226–259.
Econ. 5 (2), 177–188.
Christensen, K., Oomen, R.C., Podolskij, M., 2014. Fact or friction: jumps at ultra high
Waggoner, D., 1997. Spline methods for extracting interest rate curves from coupon bond
frequency. J. Financ. Econ. 114 (3), 576–599.
prices. Working paper. Federal Reserve Bank Atlanta.
Cieslak, A., Povala, P., 2016. Information in the term structure of yield curve volatility.
Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining
J. Finance 71 (3), 1393–1436.
integrated volatility with noisy high-frequency data. J. Am. Stat. Assoc. 100 (472),
Cochrane, J.H., Piazzesi, M., 2005. Bond risk premia. Am. Econ. Rev. 95 (1), 138–160.
1394–1411.
Collin-Dufresne, P., Goldstein, R.S., 2002. Do bonds span the fixed income markets? The-
ory and evidence for unspanned stochastic volatility. J. Finance 57 (4), 1685–1730.
Collin-Dufresne, P., Goldstein, R.S., Jones, C.S., 2009. Can interest rate volatility be ex-
tracted from the cross section of bond yields? J. Financ. Econ. 94 (1), 47–66.
22