Academia.eduAcademia.edu

Callable Swaps, Snowballs and Videogames

2007, SSRN Electronic Journal

Munich Personal RePEc Archive CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Albanese, Claudio Independent Consultant 9 September 2007 Online at https://mpra.ub.uni-muenchen.de/5229/ MPRA Paper No. 5229, posted 09 Oct 2007 UTC CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES CLAUDIO ALBANESE Abstract. Although economically more meaningful than the alternatives, short rate models have been dismissed for financial engineering applications in favor of market models as the latter are more flexible and best suited to cluster computing implementations. In this paper, we argue that the paradigm shift toward GPU architectures currently taking place in the high performance computing world can potentially change the situation and tilt the balance back in favor of a new generation of short rate models. We find that operator methods provide a natural mathematical framework for the implementation of realistic short rate models that match features of the historical process such as stochastic monetary policy, calibrate well to liquid derivatives and provide new insights on complex structures. In this paper, we show that callable swaps, callable range accruals, target redemption notes (TARNs) and various flavors of snowballs and snowblades can be priced with methods numerically as precise, fast and stable as the ones based on analytic closed form solutions by means of BLAS level-3 methods on massively parallel GPU architectures. 1. Introduction Fixed income derivatives have had a story spanning three decades and represent the playground where many of the innovations in Mathematical Finance have been first introduced. One factor short rate models are described in (Vasicek 1977), (Ho and Lee 1986), (Hull and White 1993), (Cox et al. 1985) and (Black and Karasinski 1991). Multi-factor short rate models are in (Cheyette 1992), (Longstaff and Schwartz 2001), (Chen 1996) and general affine models are in (Duffie and Kan 1996) and (Dai and Singleton 2003). All these models are characterized by at least partial analytic solvability for the discount curve and caplets. The constraint of analytic solvability derives from engineering considerations and applicability to pricing purposes. However, requiring solvability lessens the main advantage of such models which is the potential proximity between the pricing measure as is revealed by derivative prices and the real world econometric estimates based on historical time series. Market models based on an over-complete set of stochastic processes constrained by drift restrictions are in (Heath et al. 1992) and in (Brace et al. 1997), (Jamshidian 1997) and (Sandmann and Sondermann 1997). Market models are more flexible and calibrate better to swaption prices than the traditional short rate models. The implied dynamics for swaption prices and the term structure itself however do not reflect econometric evidence. This is often compensated by using a market model to decompose an exotic in elementary contracts and then applying a model such as SABR in (Hagan et al. 2002) to risk manage at the book level. Date: first version: September 9th, 2007. Revised October 1, 2007. I am deeply indebted with Manlio Trovato for the many discussions and work in collaboration on interest rate modeling. 1 2 CLAUDIO ALBANESE The Markov functional model in (Hunt et al. 2000) can be regarded as a light version of a BGM model which offers a number of computational advantages as it is implementable on lattices, see (Bennett and Kennedy 2005). The technique of recombining bushy trees discussed in (Tang and Lange 2001) also offers a systematic way of building a recombining lattice models out of a BGM specification. These lattice models however have limitations as they face the challenge of having to substantially prune the number of nodes to control dimensional explosion. The industry standard for path dependent options such as target redemption notes (TARNs), callable swaps such as CMS spread range accruals, callable snowballs and TARN snowblades is currently to use market models and Montecarlo methods, combined the variance reduction techniques in (Longstaff and Schwartz 2001). Let us recall that a CMS is a swap paying LIBOR or a fixed rate against a floating rate given by a fixed tenor swap rate prevailing at the time the coupon is paid. A TARN is a structure which terminates whenever either a maturity is reached or the total amount paid on the structured coupon leg exceeds a certain threshold. A snowball is a structured swap with a funding leg and a coupon stream whereby the coupon paid on a given date is given by the sum of a fraction of the coupon paid in the previous period plus an amount determined by the realization of the rate process in the coupon period itself. A snowblade is similar to a snowball except that it is not callable and termination is triggered by a TARN condition. From a computational viewpoint, short rate lattice models typically execute on single threaded CPUs. Montecarlo schemes for market models are naturally parallelizable and are well suited to cluster computing implementations. The predominance of cluster computing in the last decade was one of the leading factors sustaining the predominance of market models. In this paper, we introduce a new and flexible modeling framework for short rate models which does not rely on analytic solvability in any sense at all. As opposed to evaluating hypergeometric functions, as our main numerical engine we make use of matrix-matrix multiplication routines implemented on massively parallel multicore architectures typical of emerging graphic processing units (GPUs). GPUs are particularly interesting from an engineering viewpoint as they are mass-produced and low cost due to the applications to video-games and computer aided design. We show that consideration of this more general class of models allows one to overcome the limitations of traditional analytically solvable models, can be made to resemble to a greater degree the econometric evidence, can be calibrated well and can used as a basis for efficient pricing of exotic structures such as callable swaps, TARNs, snowballs and snowblades. GPU chipsets are based on arrays of single-instruction-multiple-data (SIMD) processors running at lower frequency than current CPUs but capable of processing hundreds of threads per clock cycle. Unlike CPU clusters, GPU multi-core chipsets have large blocks of fast shared memory and fast interconnects. Also unlike CPUs, GPUs are optimized for single precision arithmetics, while double precision is implemented in software and typically comes with a performance impact of about a factor 10. These architectures are particularly suitable to execute tasks that can leverage on a high degree of parallelism, fast shared memory and broad communication bandwidth such as BLAS level-3 methods and in particular large matrix-matrix multiplications. The level-3 methods we use involve time discretization steps as small as one-hour and because of this reason exhibit internal CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES 3 smoothing mechanisms which are not present in sparse BLAS level-2 methods or Montecarlo methods. Thanks to the internal smoothing properties, calculations can be safely carried out in single precision in their entirety. One can even safely numerically differentiate probability kernels to find sensitivities and price path-dependent options within single precision floating point arithmetics. In this paper, we show how to structure pricing algorithms based on level-3 BLAS which execute particularly efficiently on GPUs, are robust in the native single precision and allow one to price a variety of fixed income exotics including European and Bermuda swaptions, callable constant maturity swaps (CMSs), callable CMS spread range accruals, CMS and CMS spread target redemption notes (TARNs), callable and TARN snowballs. The model can be freely specified as long as it is a Markov model for the short rate and possibly other factors under the risk neutral measure. Specification strategies can vary, one could proceed non-parametrically as well as semi-parametrically. The numerical scheme we use is based on a lattice which on current hardware can realistically admit up to 1500 sites (we use 672 in our example) and an elementary time discretization step not longer than a few hours at most. Partial explanations for the smoothing phenomenon in terms of rigorous results are in (Albanese and Mijatovic 2006) and (Albanese 2006). We find that the smallness of the time discretization step is crucial for the existence of internal smoothing and the usability of a single precision engine. Pade’ approximants and implicit methods can be implemented on such architectures using mixed precision iterative methods as discussed in (Buttari et al. 2007). From the mathematical viewpoint, instead of analytic closed form solutions we make use of operator methods, functional calculus and algebraic manipulations, see (Albanese 2006). The paper is structured as follows. In Section 2, we describe our working model, a 2-factor stochastic monetary policy model. This model could easily be extended by introducing a stochastic volatility term, but for simplicity’s sake we don’t pursue this possibility. In Section 3, we review the basic concepts behind operator methods such as path-ordered exponentials, the fast-exponentiation algorithm, the Feynman path-integral expansion, Dyson’s formula, pointwise kernel convergence and diffusion smoothing. Section 4 is about the theory of Abelian processes and how this applies to path-dependent structured products. Section 5 covers moment methods. Section 6 contains a discussion of scaling laws and numerical complexity on massively parallel GPU architectures. We present strategies to price portfolios of exotics collectively and valuation strategies for callable and TARN snowball structures. Lattice models for TARNs are discussed in (Albanese and Trovato 2006) and that method is based on the theory of Abelian processes and first implemented on a CPU architecture. To our knowledge, this is the first article in the public literature where snowballs are priced by means of a lattice model implemented on a massively parallel GPU architecture. 2. A working example: A stochastic monetary policy short rate model The numerical methods described in this paper apply in great generality to all multi-factor interest rate models based on a Markov process where one of the factor is the short rate and the representation corresponds to the risk neutral measure whereby one uses the money market account as the numeraire asset. No analytic solvability assumption is made, not even for the term structure of interest rates. 4 CLAUDIO ALBANESE The only limitation is given by the number of sites in the discretization lattice for the short rate itself and the additional factors. To illustrate the methodology, we discuss in detail a relatively simple example with 84 discretization points for the short rate and 8 regimes, for a total of 672 lattice sites. However, lattices with up to 1500 sites are still quite realistic from the engineering viewpoint given current hardware. Although short rate models are within range of our approach, the lattice size limitation makes it impractical to implement market models describing an overcomplete set of risk factors subject to drift restrictions. The problem is not so much with the drift restriction, rather with the sheer number of forward rates in a market model which should be discretized and fit on a lattice. In this case, the technique of non-exploding bushy trees in (Tang and Lange 2001) provides an effective strategy for lattice discretization which is however quite distinct from our approach. To take advantage of GPU platforms, we use operator methods from the start and express our model by means of a Markov generator given by a time dependent matrix of the form (1) Lα (x, a; x′ , a′ ; t) = D(x, x′ |a, t)δaa′ + Tmp (a, a′ |x, t)δxx′ A state variable is given by a couple y = (x, a) whereby x is related to the short rate by a function of the form (2) r̄(x, t) ≡ r(x)λ(t). The function r(x) is specified initially as a regular grid extending from 0 to 50%. The variable a is a monetary policy regime indicator. In our example, x = 0, ...83. The function λ(t) is piecewise constant in time and is determined iteratively starting from current time t = 0 in such a way to fit the term structure of interest rates, as explained below. The function λ(t) that applies to our example is plotted in figure 2. Notice that this function does not deviate from 1 by more than the 5%. This is obtained by construction we aim at explaining the general shape of the yield curve by specifying the stochastic monetary policy dynamics as opposed to doing so by tweaking the function λ(t). The mean reversion term D(x, x′ ; |a, t) is a tri-diagonal matrix such that D(x, x′ ; |a, t) = 0 if |x − x′ |>= 2 and its first and second moments are given by X Dout (x; x′ |a, t)(r(x′ ) − r(x)) (3) κ1 (θ1 − r̄(x, t)) + µa (x) ≡ x′ (4) σa2 r(x)2β(x) ≡ X Dout (x; x′ |a)(r(x′ ) − r(x))2 x′ Here, κ1 = 6.25% and θ1 = 13%. The variable a takes eight values and distinguishes between 8 different monetary policies: a = 0 corresponds to a deflation scenario and the other states for a > 0 correspond to a normal monetary policy regime with rates falling, stable or rising at various rates. More precisely, the diffusion term D(x, x′ ; |a, t) is a tri-diagonal matrix such that D(x, x′ ; |a, t) = 0 if |x − x′ |>= 2 and its first and second moments are given by X Dout (x; x′ |a, t)(r(x′ ) − r(x)) (5) µa (x) ≡ x′ CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES σ 2 (t)r(x)2β(x) ≡ (6) X 5 Dout (x; x′ |a, t)(r(x′ ) − r(x))2 x′ Here, σ(t) = 11.5% + 6.5%(1 − e−t ) and the functions µa (x) are given by the following table: a µa (x) a=0 κ0 (θ0 − r(x)) a=1 -75bp a=2 a=3 -50bp -25bp Table 1. a=4 0bp a=5 25bp a=6 50bp a=7 100bp Here, κ0 = 38% and θ0 = 0.25%. The monetary policy dynamics is chosen in such a way to be in qualitative agreement with the historical process in 1. A more refined, quantitative econometric analysis would be useful and possible here, but venturing in this direction is not the objective of this article. See (Dai and Singleton 2003) and (Ait-Sahalia et al. 2005) and references therein for a review of econometric studies. Finally, the function β(x) is calibrated to the implied volatility backbone, as explained below. The time dependent matrix Tmp (a, a′ |x, t) gives the transition probability rates between monetary policy regimes. This matrix is parametrized in terms of a transition probability intensity p(t) = p0 + (p1 − p0 )e−τp t , (7) with p0 = 50%, p1 = 40%, τp = 3Y . A mean reversion rate k(t) = k0 + (k1 − k0 )e−τk t , (8) with k0 = 12%, k1 = 4% and τk = 15Y . And a monetary policy drift term m(t) = m0 + (m1 − m0 )e−τm t , (9) with m0 = −15%, m1 = 1% and τm = 0.5Y . If a > 0, we set 1 (10) Tmp (a, a + 1|x, t) = p(t) + (4 − a)k(t) + m(t). 2 If a − 1 > 0 then 1 2 2 (11) Tmp (a, a − 1|x, t) = p(t) − (4 − a)k(t) − m(t), 3 3 3 and if a − 2 > 0 1 1 1 (12) Tmp (a, a − 2|x, t) = p(t) − (4 − a)k(t) − m(t). 6 3 3 The deflation regime is a bit special and for it we set (13) Tmp (0, 3|x, t) = Tmp (0, 4|x, t) = 5% and, if a = 1, 2, 3, we set (14)   r(x) Tmp (a, 0|x, t) = 10%(4 − a) 1 − . r0 + Here r0 = 5%. All other matrix elements of Tmp (a, a′ |x, t) are zero. We stress that this is just a working example that we choose here to show the methods and a calibration example. The mathematics and numerical analysis in the following sections would apply and extend disregard of the specific model specification. The yield curves produced by this model are in Figures 3, 4, 5, 6, 7. The 6 CLAUDIO ALBANESE projected monetary policy scenarios are given in Figure 10 and the state dependent correlations are plotted in Figures 9 and 8. Figure 1. History of short rates (fund rates) for US dollar, the Euro and the Japanese Yen. Figure 2. Time dependent adjustment function λ(t) to fit the term structure of rates. Notice that this function is very close to 1. This is achieved by calibrating the drift of the monetary policy process. 3. Numerical methods To price interest rate derivatives under a short rate model we need to accomplish a few basic tasks. The first is to evaluate numerically the discounted propagator satisfying the equation d (15) U (t1 , t2 ) + (L(t1 ) − λ(t1 )r)U (t1 , t2 ) = 0, dt1 CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES 7 where r is the multiplication operator by the short rate grid function r(x), with final time condition U (t2 , t2 ) = I, the identity operator, for all time ordered pairs t1 ≤ t2 . This is the propagator which is used to discount cash flows under the risk neutral measure. Having done this, we need to evaluate more general propagators carrying path information for path-dependent options. The single-name propagator satisfying equation (15) is given by the path-ordered exponential  Z t2   L(s) − λ(s)r ds . (16) U (t1 , t2 ) = P exp t1 There are two useful methods to express a path-ordered exponential. One is the Feynman path-integral expansion (17)   Z t2    L(s)ds = lim I + δtN L(t1 ) I + δtN L(t1 + δt2 ) ... I + δtN L(tN ) P exp N →∞ t1 t2 −t1 N . The second is Dyson’s formula n  Z t2  X  Z t2 ∞ 1 P L(s)ds . L(s)ds = P exp n! t1 t1 n=0 where δtN = (18) where (19) P Z t2 t1 L(s)ds n = n! Z t2 ds1 t1 Z t2 s1 ds2 .... Z t2 dsn L(s1 )L(s2 )...L(sn ). sn−1 Dyson’s formula is useful from a theoretical viewpoint. In the next section we provide examples of its use by deriving a numerical moment method to price a number of Abelian path-dependent options, ranging from risky annuities to volatility swaps and cliquets. The Feynman path integral representation is interesting in this context as this formula can be implemented numerically very efficiently on GPU architectures by the method of fast exponentiation. The method works as follows. Assume that the dynamic generators L(t) are piecewise constant as a function of time. Suppose L(t) = Li in the time interval [ti , ti + (∆t)i ]. Assume δt be chosen so small that the following two conditions hold: (FE1) min(1 + δtLi (y, y)) ≥ 1/2 y∈Λ (∆t)i = n ∈ N. δt This condition leads to intervals δt of the order of one hour of calendar time and this is indeed the choice we make. To compute e(∆t)i Li (x, y), we first define the elementary propagator (FE2) (20) log2 uδt (x, y) = δxy + δtLi (y, y) and then evaluate in sequence u2δt = uδt · uδt , u4δt = u2δt · u2δt , ... u2n δt = u2n−1 δt · u2n−1 δt . Matrix multiplication is accomplished numerically by invoking either the single precision routine sgemm or the double precision routine dgemm. Since GPUs floating point units are natively single precision and double precision is slower by a factor 10, our advice is to use sgemm. Interestingly enough, we find that single precision is robust for all practical applications we considered, including the working 8 CLAUDIO ALBANESE example in this paper. In fact, the algorithm of fast-exponentiation as described above with a δt satisfying the above bound has self-smoothing properties which lessen the impact of floating point errors to be far less than what one would naively expect with a back-of-the envelope worst case estimate. The mathematical reasons behind this phenomenon appear to be deep and not well understood; see (Albanese and Mijatovic 2006) and (Albanese 2006) for some results in this direction. Fast exponentiation per se suffices to evaluate all discount functions and to price European and Bermudan swaptions and to find the CMS convexity convections. We find that model parameters can be calibrated against at-the-money options is quite feasible by using a Levenberg-Marquardt algorithm with box bounds as the sensitivities are sharp and stable even if single precision arithmetics is used. The calibration results for at-the-money European swaptions are showed in Figures 11, 12 and 13. The skews we obtain are in 14, 15 and 16. Notice that the persistent skews are controlled by the occupancy probability of the deflation state, as the other time dependencies in the model are concentrated on short maturities. The function β(r) is chosen in a time homogeneous way in such a way to fit the implied volatility backbone in Figures 17, 18 and 19. Notice that the function β(r) is time homonegenous and does not help to create a persistent implied volatility skew. The Bermuda backbone in figures 20 and 21 is also an important calibration target. Finally, the CMS convexity corrections, i.e. the difference between an equilibrium CMS swap spread and the plain vanilla swap of matching maturity is given in Figures 22 and 23. 4. Abelian Processes In (Albanese 2006) and (Albanese and Vidler 2007), we introduce the notion of Abelian processes as an extension of the notion of stochastic integral over a diffusion process, see also (Albanese and Trovato 2006) and (Albanese et al. 2006). Abelian processes are characterized by an algebraic commutativity condition which allows one to extend the key results of stochastic calculus such as Girsanov’s theorem, Ito’s lemma, the Cameron-Martin theorem and the Feynman-Kac formula. To explain the notion of Abelian process, consider a lattice process yt = (xt , at ) as the one introduced in the previous section and consider a discretely valued Markov chain process mt dependent on the path followed by yt such that the joint process (yt , mt ) is Markov and its generator is given by the lifted operator (21) L̃(y, m; y ′ , m′ ; t) = L(y, y ′ ; t)δmm′ + A(m; m′ |y)δyy′ . Consider the algebra spanned by sums and products of the operators A(y) of matrix elements A(y)mm′ = A(m; m′ |y). If this algebra is commutative, i.e. if the commutators [A(y), A(y ′ )] ≡ A(y)A(y ′ ) − A(y ′ )A(y) vanish for all pairs y, y ′ , then the process mt is called Abelian. It is simple to prove that if two matrices A and B have distinct eigenvalues and commute, then they can be diagonalized simultaneously. Let u be an eigenvector of A of eigenvalue λ. Then we have that (22) BAu = λBu = ABu. Hence, also Bu is an eigenvector of A of eigenvalue λ. Since the eigenvalues of A are assumed to be mutually distinct, Bu needs to be proportional to u, i.e. Bu = µu for some µ. This implies that u is also an eigenvector of B of eigenvalue µ. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES 9 The set of matrices with at least two coinciding eigenvalues has zero measure in the space of all matrices. This set is a codimension-1 surface. Any neighborood around each point on this surface cointains a matrix whose eigenvalues are all distinct. Hence, if a matrix has degenerate eigenvalues, there are arbitrarily small perturbations which lift the degeneracy. One would conclude that degeneracies are a fairly unusual occurrence. However, from the numerical viewpoint one has to pay attention to the phenomenon as near degeneracies can translate in ill-conditioning. In this case, floating point errors can be amplified to give rise to numerical instabilities. If the Abelian property is satisfied and unless near-degeneracies and ill-conditioning manifest themselves, then there exist a numerically usable non-singular linear transformation S(m; i) which diagonalizes all the operators A(m; m′ |y) simultaneously. Hence, the transformed lifted generator S −1 L̃S is a complex, block-diagonal matrix which can be fast-exponentiated by means of the BLAS routine cgemm which implements a complex matrix-matrix multiplication in single precision. This exponential gives the joint distribution of the pairs (yt , mt ) for all time horizons t. Possibly capped and floored range accruals provide an example for the application of this theory. The time a given rate spends within a certain range is in fact an Abelian process. Also the maximum achieved by a given rate over a certain period of time is an example of an Abelian process and lookback options can thus be priced by block-diagonalization. A discrete version of the notion of Abelian process is particularly useful to price interest rate derivatives such as target redemption notes (TARNs). Consider a coupon period ∆T and a state dependent payoff which is payed at times Ti = i∆T and is of the following general form (23) Cs (yi−1 , yi , Ti ) − Cf (yi−1 , yi , Ti ). where Cs (x, Ti ) is the coupon paid on the so-called structured leg of the TARN and Cf (x, Ti ) is the coupon paid instead on the so-called funding leg. A TARN is characterized by a maturity T = Tn for some integer n > 0 and a target H. One says that the target is reached on the k−th coupon date Tk if (24) k X Cs (yi−1 , yi , Ti ) > H. i=1 A TARN is structured so that cash flows continue either until maturity or at the first coupon date at which the target is reached, whichever comes first. To price a TARN, one can consider the following lifted propagator   Z Ti L(t)dt (yi−1 ; yi )K(yi−1 , mi−1 ; yi , mi ) (25) Ui (yi−1 , mi−1 ; yi , mi ) = P exp Ti−1 where  Cs (yi−1 , yi , Ti ) K(yi−1 , mi−1 ; yi , mi ) ≡ p(yi−1 , yi , Ti )δ m − m − ∆Cs ′  ! − (26)  !   Cs (yi−1 , yi , Ti ) ′ . + 1 − p(yi−1 , yi , Ti ) δ m − m − ∆Cs + 10 CLAUDIO ALBANESE Here, (∆Cs ) is a discretization interval for the structured coupon dimension. If ξ ∈ R, we denote with [ξ]+ the smallest integer larger than ξ and with [ξ]− the largest integer smaller than ξ. Furthermore, p(yi−1 , yi , Ti ) is chosen so that (27)      Cs (yi−1 , yi , Ti ) Cs (yi−1 , yi , Ti ) Cs (yi−1 , yi , Ti ) + 1−p(yi−1 , yi , Ti ) = . p(yi−1 , yi , Ti ) ∆Cs ∆C ∆Cs s − + Let K(yi−1 , yi ) be the operators of matrix elements K(yi−1 , yi )mi−1 mi = K(yi−1 , mi−1 ; yi , mi ). (28) The operators K(yi−1 , yi ) commute with each other for all choices of pairs (yi−1 , yi ). Hence, in most cases and assuming no ill-conditioning situations arise, they can be simultaneously diagonalized. To complete the definition of the operator K(yi−1 , mi−1 ; yi , mi ) in a usable way, we need to limit the range of variability of the variable m to a finite interval m = 0, 1, ...M for some M > 0 and specify boundary conditions for the process as it reaches the site m = M . A useful strategy is to impose periodic boundary conditions so that upon surpassing M , the process winds back and re-enters from the point m = 0, as if the interval [0, M ] was wrapped around a circle. With this choice, the situation is simple as the block-diagonalization can be accomplished in all cases by means of an expicit partial Fourier transform whose implementation does not give rise to ill-conditioning. Let F be the operator such that (29) û(y, p) = (Fu)(y, p) = M X e−ipm u(y, m), m=0 where p = (30) 2πj M +1 , j = 0, 1, ...M . Then we have that the transformed operator Ûi = FUi F −1 is block-diagonal and has matrix elements (31)   Z Ti M X K(yi−1 , 0; yi , m)eipi m . L(t)dt (yi−1 ; yi ) Ûi (yi−1 , pi−1 ; yi , pi ) = δpi−1 pi P exp Ti−1 m=0 We conclude that the joint distribution of the discrete time process mTi and the underlying process yTi at time Ti conditioned to starting at y0 and m0 = 0 at time 0 is given by   M i Y −mi−1 ) 1 X 2iπj(mMi+1  e (32) Ui (yi−1 , mi−1 ; yi , mi ) = Ûj (p) (y0 , yTi ) M + 1 j=0 j=0 where Ûj (p) is the operator of matrix elements Ûj (p)(y, y ′ ) = U (y, p; y ′ , p). Notice here one of the differences between probability theory expressed via stochastic calculus as opposed to our formalism based on operator methods: in the former one focuses on measure changes which are given by one parameter families of positive linear transformations, while in our case we can make full use of general complex valued linear tansformations as the applicability of operator methods does not hinge on the existence of a probabilistic interpretation for the dynamic generators. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES 11 5. Moment Methods In addition to applying block-diagonalization techniques, for most Abelian payoffs one can also use a moment method based on Dyson’s expansion. The method applies whenever one needs to evaluate moments of stochastic integrals over a bridge, such as  Z T m  φ(yt , t)dt δ(yT ′ − y ′ ) y0 = y . (33) J(y, y ′ ) ≡ E0 0 Here m is an integer and [0, T ] is a given interval. In this case we have that Z T  m  d ′ P exp (34) J(y, y ) = Lǫ (t)dt (y, y ′ ) dǫm ǫ=0 0 where (35) L̃ǫ (t)(y, y ′ ) = L(y, y ′ ; t) + ǫφ(y; t)δyy′ . For instance, to evaluate a range accrual where each coupon payment is proportional to the time a given swap spread s1t − s2t is in the interval [L, U ], one can use the formula above with  (36) φ(y; t) = 1 s1 (y; t) − s2 (y; t) ∈ [L, U ] . If the range accrual is capped and/or floored the problem is complicated by the fact that in general knowledge of the first moment is not sufficient and all moments are required to reconstruct the probability distribution function for the time elapsed in the given range. However, also in this case one can take advantage of the simplicity of moment method by using an ansatz to extrapolate the form of the distribution from the knowledge of its first two or four moments. As an ansatz distribution one can use for instance either the log-normal or the chi-squared distribution. In general, it is possible to verify the quality of the ansatz by using the block-diagonalization method in the previous section, as range accruals are based on a continuous time Abelian process. Similarly, discrete payoffs such as TARNs can also be priced approximately by means of moment methods. Consider a moment of a stochastic sum over a bridge of the form  X m  i ′ ′ φj (yTj−1 , yTj ) δ(yTi − y ) y0 = y (37) F (y, y ) ≡ E0 j=1 where m is an integer and [0, T1 , ....Ti ] is a sequence of time points. In this case we have that Y   m i d U (T , T ) (y, y ′ ) (38) F (y, y ′ ) = ǫ j−1 j dǫm ǫ=0 j=1 where the matrix elements of the operators Uǫ (Tj−1 , Tj ) are given by (39)     Z Tj L(t)dt (y, y ′ ) exp ǫφj (yTj−1 , yTj ) (y, y ′ ). Uǫ (y, Tj−1 ; y ′ , Tj ) = P exp Tj−1 When pricing a TARN instead, one needs to proceed by iteration going forward in time. In this case, at every step one needs to apply the kernel to a probability distribution function ψ(y, Tj−1 ) which is initialized to have unit mass at the initial 12 CLAUDIO ALBANESE point y = y0 . The function φj (yTj−1 , yTj ) in the kernel is the j−th coupon function. At each date, one then numerically differentiates with respect to ǫ at least twice and evaluates at ǫ = 0. This provides the two first moments of the distribution of the total accrued by the structured leg on a give bridge originating from the spot and allows one to reconstruct the probability distribution function and the survival probability conditioned on a bridge. This survival probability needs to be applied to each coupon amount which then has to be discounted to current time using the discounted transition probability kernel. An application of moment methods is given in Figures 25, 26, 27, 28, 29, 30, 31 where we show the pricing functions of callable CMS spread range accruals. 6. Scaling of algorithmic complexity, portfolios and snowballs When using GPUs it is important to familiarize ourselves with fairly counterintuitive scaling properties for the run time required by various tasks, as this is not proportional to the traditional concept of algorithmic complexity intended as the total number of floating point operations. Since matrix-matrix multiplications are distributed across a large number of cores, the larger are the matrices one multiplies, the better is the hardware performance. In this section we show how these concepts apply to portfolio valuations and pricing of snowballs and snowblades. Consider pricing a large portfolio of European or callable instruments and suppose one has available in memory all the required discounted transition probability kernels and data structures that describe cash-flows. In this case the best strategy is to organize the pricing functions for all the instruments in the portfolio in a large matrix, attributing one column vector to each instrument, and perform the backward induction collectively on the entire portfolio. The alternative strategy of iterating on each instrument separately is extremely more time expensive. Task GPU-O GPU-D Host-O Initialization 4.79 5.16 3.77 Calibration to term structure of rates 4.76 6.14 61.88 585 European swaptions 7.62 10.17 85.30 Portfolio of CMS spread range accruals: 9 callable swaps and 3 callable snowballs 21.03 23.74 134.59 30240 ATM European swaptions 11.34 16.16 138.68 13725 ATM European swaptions and 13725 ATM Bermuda swaptions 56.56 59.02 272.19 Table 2. Execution times in seconds for various portfolios under various configurations. GPU-O: using the GPU with host side optimized code. GPU-D: using the GPU with host side debug code. Host-O: using the host only with optimized code and Intel MKL libraries. Ratio: column 3 versus column 1. Ratio 0.79 13.00 11.19 6.40 12.23 4.81 Table 2 illustrates performance with examples of portfolios, also comparing the performance of optimized code versus debug code. The code for this example is written almost entirely in VB.NET, the CPU is a 2 GHz Xeon and the GPU is a card Nvidia 8800 GTX originally intended for the games market, the API to the GPU being given by the CUDA-BLAS library. One can notice that the debug CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES 13 code runs only 5% slower than optimized code. The code is entirely managed but by extrapolation one would conclude that the speed up achievable by using native C code as opposed to VB.NET would also be negligible. The computational bottleneck is in fact captured by calls to sgemm executing on the GPU. One can notice that while a portfolio of 585 European swaption takes 7.62 seconds to price, a portfolio with 30,240 swaptions takes only 11.34 seconds. Execution time thus scales in a very non-linear way with respect to the number of floating point operations. This is achieved by organizing all pricing functions of the portfolio as column vectors in a matrix and applying backward induction to the portfolio in aggregate. The alternative strategy of pricing each swaption individually would give rise to linear scaling and be much more time expensive. However, by multiplying a large kernel matrix with a large payoff matrix, the load is partitioned more evenly across the cores, each one handling the task of multiplying sub-blocks, and as a consequence the performance is greatly enhanced. Notice that the performance gain with respect to an application running on the host varies. The initialization is slightly slower when using a GPU due to memory allocation on the card. The calibration to the term structure of interest rates shows a gain by a factor 15. The pricing of European swaptions shows speedups by a factor 11-12, with a slightly better performance for the larger portfolio. The speedup for the portfolios containing callables is not as great and is only 4.81 for the portfolio with 13,725 Bermuda swaptions. The reason for this is that the implementation was designed in such a way that the exercise condition was checked on the host and at each exercise date a portion of the portfolio was moved across the bus. This can easily be avoided by checking the exercise condition with a special function written in C and executing on the GPU side. By eliminating the bus overhead, the performance gain is again in the range 11-12. Another instance where counter-intuitive scaling is revealed and can be taken advantage of is the pricing of snowballs and snowblades. A snowball is a callable swap whose structured coupon at time Ti has the following form (40) CTi = kCTi−1 + Φi (yTi−1 , yTi ) where k is a fixed parameter. The difficulty with snowballs is that the coupon process is not Abelian, in fact it is path-dependent. Hence, the methods in the two previous sections are not applicable to reduce dimensionality. However, using multi-core scaling it is possible to setup a very efficient pricing scheme for snowballs. Callable snowballs are priced by backward induction while snowblades are priced by forward induction using either block-diagonalizations or, for greater efficiency, the moment method. Consider first callable snowballs. In this case, one needs to discretize the coupon dimension in intervals ∆C, so that a generic coupon can be approximated as follows CTi = (∆C)nTi where nTi = 0, 1, ....N − 1 is a discrete time, integer value process. In the working example we consider here we choose N = 100. A strategy to implement the backward induction scheme for a single callable snowball is to organize the payoff function at maturity in a matrix with N columns, each one indexed by the state variable y and representing the price conditional to the discretized value of the previous coupon paid. One can then iterate backward by applying the pricing kernel to this matrix of conditional pricing functions. After each step in the iteration, one needs to reshuffle the pricing functions in such a way that the conditioning relation on each column is satisfied. When pricing a portfolio 14 CLAUDIO ALBANESE of European or callable snowballs there is obviously a great advantage in pricing them all together. Pricing functions for callable snowballs are given in Figures 32, 33, 34. When pricing a TARN snowball instead one needs to go forward in time and obtain the survival probability distributions ψj (y, n) on a bridge starting from the spot and ending at y at time Tj . This requires obtaining the joint probability that at time Tj the site y is occupied and the coupon paid at time Tj−1 is approximately (∆C)n. In this case, at every step one needs to multiply the n − th column by ekn(∆C) and then apply the kernel to all columns collectively where the function φj (yTj−1 , yTj ) is the j−th coupon function. At each date, one then numerically differentiates with respect to ǫ at least twice and based on the outcome, reconstruct the probability distribution function for the total accrued on the structured leg and the survival probability conditioned on a bridge. This survival probability needs to be applied to the coupon amount which then has to be discounted to current time. Also in this case, on massively parallel multicore architectures one obtains major speedup ratios by pricing portfolios of TARN snowballs in one single iteration. 7. Conclusions We have shows that short rate models specified semi-parametrically or even non-parametrically are viable from the engineering viewpoint as frameworks within which to price exotic single name interest rate derivatives. We find that the recently introduced GPU platforms which are being mass-produced because of applications to the video-game and computer-aided-design markets are an ideal hardware environment for such modeling approaches. We find that BLAS level-3 methods for which the speed-up ratios are more significant are stable under single precision arithmetics and that sensitivities are also smooth. The mathematical framework that supports such analysis is based on operator methods, fast exponentiation and the theory of Abelian processes. It is also based on techniques to exploit the highly non-linear scaling behavior observed on massively parallel multi-core architectures between the nominal algorithmic complexity and execution time. The methods presented here can possibly be extended to cross-currency derivatives and hybrids by modeling correlations by means of dynamic conditioning, as we have already demonstrated in the case of unmanaged CDOs in (Albanese and Vidler 2007), but we reserve to discuss these extensions in a future article. References Ait-Sahalia, Y., Hansen L.P. and J. A. Scheinkman (2005). Operator Methods for Continuous Time Markov Processes. Journal of Economic Theory. Albanese, C. (2006). Operator Methods, Abelian Processes and Dynamic Conditioning. preprint, available at www.level3finance.com. Albanese, C. and A. Mijatovic (2006). Convergence Rates for Diffusions on Continuous-Time Lattices. preprint, available at www.level3finance.com. Albanese, C. and A. Vidler (2007). A Structural Model for Bespoke CDOs. Willmott Magazine. Albanese, C. and M. Trovato (2006). A Stochastic Volatility Model for Callable CMS Swaps and Translation Invariant Path Dependent Derivatives. preprint, available at www.level3finance.com. Albanese, C., H. Lo and A. Mijatovic (2006). Spectral Methods for Volatility Derivatives. preprint. Bennett, M.N. and J.E. Kennedy (2005). Common Interests. Finance and Stochastics. Black, F. and P. Karasinski (1991). Bond and Option Pricing when Short Rates are Lognormal. Financial Analysts Journal pp. 52–59. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES 15 Brace, A., D. Gatarek and M. Musiela (1997). The Market Model of Interest Rate Dynamics. Mathematical Finance 7, 127–154. Buttari, A., J. Dongarra, J. Kurzak, J. Langou, P. Lusczek and S. Tomov (2007). Exploiting Mixed Precision Floating Point Hardware in Scientific Computations. preprint, available at www.netlib.org. Chen, Lin (1996). Interest Rate Dynamics, Derivatives Pricing, and Risk Management. Springer. Cheyette, O. (1992). Term Structure Dynamics and Mortgage Valuation. Journal of Fixed Income 1, 28–41. Cox, J.C., J.E. Ingersoll and S.A. Ross (1985). A theory of the term structure of interest rates. Econometrica. 53, 385–407. Dai, Q. and K. Singleton (2003). Term Structure Dynamics in Theory and Reality. Review of Financial Studies 16, 631–678. Duffie, D. and R. Kan (1996). A Yield-Factor Model of Interest Rates. Mathematical Finance 6, 379–406. Hagan, P., D. Kumar, A. Lesniewski and D. Woodward (2002). Managing Smile Risk. Willmott Magazine pp. 84–108. Heath, D., R. Jarrow and A. Morton (1992). Bond Pricing and the Term Structure of Interest Rates: A New Methodology. Econometrica 60, 77–105. Ho, T.S.Y. and S.B. Lee (1986). Term Structure Movements and Pricing Interest Rate Contingent Claims. Journal of Finance. Hull, J. and A. White (1993). One-factor Interest Rate Models and the Valuation of Interest Rate Derivative Securities. Journal of Financial and Quantitative Analysis 28, 235–54. Hunt, P.J., J.E. Kennedy and A. Pelsser (2000). Markov Functional Interest Rate Models. Finance and Stochastics 4, 391–408. Jamshidian, F. (1997). Libor and Swap Market Models and Measures. Finance and Stochastics 1, 293–330. Longstaff, F.A. and E.S. Schwartz (2001). Valuing American Options by Simulation: a Simple Least Squares Approach. Rev. Financial Stud. 14, 113–148. Sandmann, K. and D. Sondermann (1997). A Note on the Stability of Lognormal Interest Rate Models and the Pricing of Eurodollar Futures. Mathematical Finance 7, 119–125. Tang, Y. and J. Lange (2001). A Non-Exploding Bushy Tree Technique and its Application to the Multifactor Interest Rate Market Model. Computational Finance. Vasicek, O. A. (1977). An Equilibrium Characterization of the Term Structure. Journal of Financial Economics 5, 177–88. Claudio Albanese E-mail address: [email protected] 16 CLAUDIO ALBANESE Figure 3. Zero curves in the deflation regime. Figure 4. Zero curves in the regime with drift -75 bp/year. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 5. Zero curves in the regime with drift -25 bp/year. Figure 6. Zero curves in the regime with drift +25 bp/year. 17 18 CLAUDIO ALBANESE Figure 7. Zero curves in the regime with drift +100 bp/year. Figure 8. Backbone of 2y-10Y correlation, i.e. correlation between daily returns of the 2Y swap rate versus the 10Y swap rate as a function of the short rate. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 9. Backbone of 1y-20Y correlation, i.e. correlation between daily returns of the 1Y swap rate versus the 20Y swap rate as a function of the short rate. Figure 10. Projected occupancy probabilities of monetary policy regimes. 19 20 CLAUDIO ALBANESE Figure 11. Implied volatilities for at-the-money European swaptions of tenor 2Y compared to market data. Figure 12. Implied volatilities for at-the-money European swaptions of tenor 5Y compared to market data. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 13. Implied volatilities for at-the-money European swaptions of tenor 20Y compared to market data. Figure 14. Implied volatility skews for European swaptions of tenor 2Y compared to market data. 21 22 CLAUDIO ALBANESE Figure 15. Implied volatility skews for European swaptions of tenor 5Y compared to market data. Figure 16. Implied volatility skews for European swaptions of tenor 20Y compared to market data. Only implied volatilites for extreme strikes 16% over the forward failed to compute, as the graph shoes. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 17. Implied volatility backbone, i.e. the scatterplot of the implied at the money volatility of 2Y into 2Y European swaptions versus the corresponding forward rate. Figure 18. Implied volatility backbone, i.e. the scatterplot of the implied at the money volatility of 4Y into 5Y European swaptions versus the corresponding forward rate. 23 24 CLAUDIO ALBANESE Figure 19. Implied volatility backbone, i.e. the scatterplot of the implied at the money volatility of 10Y into 20Y European swaptions versus the corresponding forward rate. Figure 20. Bermuda premium backbone, i.e. the Bermuda premium of 4Y into 2Y at-the-money swaptions plotted against the corresponding forward rate. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 21. Bermuda premium backbone, i.e. the Bermuda premium of 5Y into 10Y at-the-money swaptions plotted against the corresponding forward rate. Figure 22. 2Y into 5Y convexity backbone, i.e. the convexity correction for of 2Y into 5Y European constant-maturity-swaps (CMS) with respect to European swaptions. 25 26 CLAUDIO ALBANESE Figure 23. 3Y into 2Y convexity backbone, i.e. the convexity correction for of 3Y into 2Y European constant-maturity-swaps (CMS) with respect to European swaptions. Figure 24. 10Y into 20Y convexity backbone, i.e. the convexity correction for of 10Y into 20Y European constant-maturity-swaps (CMS) with respect to European swaptions. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 25. Pricing functions of callable CMS spread range accruals. Figure 26. Pricing functions of callable CMS spread range accruals. 27 28 CLAUDIO ALBANESE Figure 27. Pricing functions of callable CMS spread range accruals. Figure 28. Pricing functions of callable CMS spread range accruals. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 29. Pricing functions of callable CMS spread range accruals. Figure 30. Pricing functions of callable CMS spread range accruals. 29 30 CLAUDIO ALBANESE Figure 31. Pricing functions of callable CMS spread range accruals. Figure 32. Pricing functions of callable snowball CMS spread range accruals. CALLABLE SWAPS, SNOWBALLS AND VIDEOGAMES Figure 33. Pricing functions of callable snowball CMS spread range accruals. Figure 34. Pricing functions of callable snowball CMS spread range accruals. 31