Handout_ALM

HECTOR SCHOOL OF ENGINEERING
AND MANAGEMENT
UNIVERSITY FRIDERICIANA KARLSRUHE
Module S-5/1
Insurance, Risk Analysis and Asset Liability Management
Asset Liability Management
Svetlozar T. Rachev
Literature Recommendations
1) Jitka Dupačová, Jan Hurt and Josef Štěpán, Stochastic Modeling in Eco-
nomics and Finance, Kluwer Academic Publisher, 2002
2) Stavros A. Zenios, Lecture Notes: Mathematical modeling and its applica-

tion in finance, http://www.hermes.ucy.ac.cy/zenios/teaching/399.001/index.html
3) F. Fabozzi and A. Konisbi, The Handbook of Asset/Liability Management:

State-of-Art Investment Strategies, Risk Controls and Regulatory Required,
Wiley.
4) Svetlozar Rachev and Stefan Mittnik, Stable Paretian Models in Finance,

John Wiley & Sons Ltd., 2000
Contents
1 Introduction 1
2 Risk and Optimization 6

2.1 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Risk-Return Optimization . . . . . . . . . . . . . . . . . . . . . . 10
2.4 CVaR Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Uryasev’s Optimization Shortcut . . . . . . . . . . . . . . 15
3 Portfolio Optimization and Stochastic Programming 18

3.1 1-stage Portfolio Optimization . . . . . . . . . . . . . . . . . . . . 18
3.2 Single-stage versus Multistage Optimization . . . . . . . . . . . . 20
3.3 Formulation of the Stochastic Program . . . . . . . . . . . . . . . 22
3.4 Scenario Generation . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Deterministic Equivalent Forms . . . . . . . . . . . . . . . . . . . 28
3.6 The T-stage ALM problem . . . . . . . . . . . . . . . . . . . . . . 32
4 Modeling of the Risk Factors 35

4.1 Stable Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 Definition of Stable Random Variables . . . . . . . . . . . 36
2
4.1.2 Parameters and Special Cases of the Stable Distribution . 39
4.1.3 Properties of Stable Random Variables . . . . . . . . . . . 41
4.1.4 Truncated α-Stable Distributions . . . . . . . . . . . . . . 43
4.2 Stable Modeling of Risk Factors . . . . . . . . . . . . . . . . . . . 46
4.2.1 Modeling Financial Returns with Stable Distributions . . . 46
4.3 Univariate and Multivariate Distributions . . . . . . . . . . . . . . 49
4.4 Fitting a Multivariate Distribution . . . . . . . . . . . . . . . . . 51
4.5 Dependence Modeling and Copulas . . . . . . . . . . . . . . . . . 53
5 ALM Implementation 62
5.1 Finding an adequate model . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 Exponentially Weighted Moving Average
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.2 VaR Backtesting . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Solving the Optimization Problem . . . . . . . . . . . . . . . . . . 80
5.3 Efficient Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Postoptimality Analysis and Backtesting . . . . . . . . . . . . . . 85
5.4.1 Postoptimality Analysis . . . . . . . . . . . . . . . . . . . 85
5.4.2 Portfolio Backtesting . . . . . . . . . . . . . . . . . . . . . 87
6 Appendix - Tables for Empirical Analysis 94

Chapter 1
Introduction
Asset liability management (ALM) attempts to find the optimal investment strat-
egy under uncertainty in both the asset and liability streams. In the past, the two
sides of the balance sheet have usually been separated, but simultaneous consid-
eration of assets and liabilities can be very advantageous when they have common
risk factors. If assets are allocated such that they are highly correlated with the
liabilities, it is possible to reduce the risk of the entire portfolio.
Traditionally, banks and insurance companies used accrual accounting for es-
sentially all their assets and liabilities. They would take on liabilities, such as
deposits, life insurance policies or annuities. They would invest the proceeds from
these liabilities in assets such as loans, bonds or real estate. All assets and liabil-
ities were held at book value. Doing so disguised possible risks arising from how
the assets and liabilities were structured.
Two of the earlier ALM frameworks for constructing portfolios of fixed-income
securities are dedication and immunization. Basic dedication assumes that the
future liability payments are deterministic and finds an allocation such that bond
income is sufficient to cover the liability payments in each time period. Achieving
this type of cashflow matching in every period is likely to be costly, so traditional
1
immunization models match cashflows on average providing a cheaper, but usually
riskier, portfolio. The immunized portfolio is constructed by matching the present
values and interest rate sensitivities of the assets and liabilities, and it results in
an allocation that hedges against a small parallel shift in the term structure of
interest rates.
Consider the following simple example (see from riskglossary.com): A bank
borrows USD 100 Mio at 3.00% for a year and lends the same money at 3.20%
to a highly-rated borrower for 5 years. For simplicity, we assume that all interest
rates are annually compounded and all interest accumulates to the maturity of the
respective obligations. The net transaction appears profitable, since the bank is
earning a 20 basis point spread. However, the transaction also entails considerable
risk:
At the end of a year, the bank will have to find new financing for the loan,
which will have 4 more years before it matures. If interest rates have risen, the
bank may have to pay a higher rate of interest on the new financing than the fixed
3.20 it is earning on its loan. Suppose, for example that at the end of a year, an
applicable 4-year interest rate is 6.00%. The bank is in serious trouble. It is going
to be earning 3.20% on its loan and paying 6.00% on its financing.
Accrual accounting does not recognize the problem. The book value of the
loan (the bank’s asset) is:
100M io · 1.032 = 103.2M io
The book value of the financing (the bank’s liability) is:

100M io · 1.030 = 103.0M io
Based upon accrual accounting, the bank earned USD 200,000 in the first year.
However, market value accounting recognizes the bank’s predicament. The
respective market values of the bank’s asset and liability are:
100M io · 1.0325 1.0604 = 92.72M io
Hence, from a market-value accounting standpoint, the bank has lost USD
10.28 Mio. So which result offers a better portrayal of the bank’ situation, the
accrual accounting profit or the market-value accounting loss? The bank is in
trouble, and the market-value loss reflects this. Ultimately, accrual accounting
will recognize a similar loss. The bank will have to secure financing for the loan
at the new higher rate, so it will accrue the as-yet unrecognized loss over the 4
remaining years of the position.
The problem in this example was caused by a mismatch between assets
and liabilities. Prior to the 1970’s, such mismatches tended not to be a sig-
nificant problem. Interest rates in developed countries experienced only modest
fluctuations, so losses due to asset-liability mismatches were small or trivial. Many
firms intentionally mismatched their balance sheets. Because yield curves were
generally upward sloping, banks could earn a spread by borrowing short and lend-
ing long. But things started to change in the 1970s, which ushered in a period
of volatile interest rates that continued into the early 1980s. US regulation which
had capped the interest rates that banks could pay depositors, was abandoned
to stem a migration overseas of the market for USD deposits. Managers of many
firms, who were accustomed to thinking in terms of accrual accounting, were slow
to recognize the emerging risk. Some firms suffered staggering losses. Because
the firms used accrual accounting, the result was not so much bankruptcies as
crippled balance sheets. Firms gradually accrued the losses over the subsequent 5
or 10 years.
One of the victims of the changing conditions is the US mutual life insurance
company the Equitable. During the early 1980s, the USD yield curve was inverted,
with short-term interest rates spiking into the high teens. The Equitable sold a
number of long-term guaranteed interest contracts (GICs) guaranteeing rates of
around 16% for periods up to 10 years. During this period, GICs were routinely
for principal of USD 100 Mio or more. Equitable invested the assets short-term
to earn the high interest rates guaranteed on the contracts. Short-term interest
rates soon came down. When the Equitable had to reinvest, it couldn’t get nearly
the interest rates it was paying on the GICs. The firm was crippled. Eventually,
it had to demutualize and was acquired by the Axa Group.
We conclude that the the earlier framework of accrual accounting is inadequate
for ALM because it misses the stochastic nature of interest rates and liabilities
and the dynamic nature of investing. The two main tools that help to capture the
dynamic and stochastic characteristics are stochastic control and stochastic
programming. Stochastic control methods model uncertainty in a continuous-
time setting through Itô processes, but a drawback is that only a few driving
variables, or states, can be handled. Applications of stochastic control in ALM is
for example the surplus optimization for pension funds and life insurance.
This lecture is organized as follows:
• Chapter two reviews major issues on risk and optimization. Properties of the
Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) risk measure
are covered.
• Chapter three provides information on single- and multistage optimization

problems, stochastic programs and scednario generation.
• Chapter four discusses the choice of an adequate distribution for modeling

of the risk factors with focus on the stable distributions that is able to model
features like heavy-tails, skewness and excess kurtosis in the underlying risk
factors. We further investigate how multivariate data can be modeled ade-
quately.
• Chapter five provides an example of an empirical application of ALM tech-

niques to a pension fund. Additional issues like portfolio backtesting, com-
parison of the results based on the underlying risk factors are treated as
well.
Chapter 2
Risk and Optimization
The goal of risk-return optimization is to optimize a tradeoff between the risk

and return. This chapter reviews a few risk measures and discusses how they
can be implemented in simple single-stage portfolio optimization problems. The
techniques for optimizing CVaR presented in this chapter will later be used in a
multistage problem.
2.1 Risk Measures
The standard measure of risk for a portfolio of equities suggested by Markowitz

is the variance of the return. A portfolio consists of weights ω = (ω1 , ..., ωn )′ ,
such that ωi ≥ 0 and ni=1 ωi = 1, in n assets with corresponding risky returns
P
R = (r1 , ..., rn )′ . The risk associated with the portfolio return rp = ω ′ R is given
by σp2 = ω ′ Σω, where Σ is the covariance matrix of R. While the variance of
the investment return is the most traditional risk measure, a common criticism is
that the variance penalizes both large gains and large losses. A modification to an
6
asymmetric risk measure that accounts only for large losses is the semivariance:
2
E [ω ′ E(R) − ω ′ R]+ .
However, numerical optimization of semivariance is difficult. Another modification

is the downside formula which measures the degree that the returns are distributed
below some target return r∗ :
2
E [r∗ − ω ′ R]+ .
A second criticism of variance is that financial returns are typically heavy-tailed,

and in that case, the variance does not even exists. A logical argument can then
be made for using the mean absolute deviation (MAD) of the portfolio:
mp = E |ω ′ R − ω ′ E(R)| ,
or alternatively, the scale parameter of a stable distribution can be used in place

of the variance. Stable distributions will be discussed in more detail in chapter 3.
Some other risk measures rely only on the tail of the distribution, in which
case the modeling of the probability of extreme events becomes more important.
The following, VaR and CVaR, are two such measures. Value at Risk (VaR) is
a frequently used measure of risk for financial institutions and regulators. For
a given confidence level β ∈ (0, 1), VaR is the minimum value of the loss, or
negative return, that is exceeded no more than 100(1-β)% of the time. Its ease of
understanding helps to make it a popular risk measure.
The following notations and definitions of VaR and CVaR resemble mostly
those in [33]. For a given decision x ∈ Rn , let the random variable L(x) ∈ R
represent a loss, or negative return, for each x, and let ΨL (x, ζ) be the distribution
function for L(x):
ΨL (x, ζ) = P (L(x) ≤ ζ) .
For a given decision x, the Value at Risk at confidence level β is given by
VaRβ (x) = inf {ζ|ΨL (x, ζ) ≥ β} .
Discussion of several other tail risk measures, including the Conditional Value
at Risk (CVaR), can be found in [1] and [2]. While it is not widely used in finance,
it has properties that make it a very logical alternative to VaR. These properties
are referred to as coherence and will be described shortly.
Define a random variable Tβ (x) on the β-tail of the loss L(x) through the
distribution function:

 0 ζ < VaRβ (x)
ΨTβ (x, ζ) = (2.1)
 ΨL (x,ζ)−β ζ ≥ VaR (x)
1−β β
For a given decision x, the Conditional Value at Risk at confidence level β is the
mean of the tail random variable Tβ (x) with distribution function (2.1):
CVaRβ (x) = E (Tβ (x)) .
As is implied by its name, CVaR is closely related to the conditional expectation

beyond VaR. In general, CVaR satisfies the inequalities
E (L(x)|L(x) ≥ VaRβ (x)) ≤ CVaRβ (x) ≤ E (L(x)|L(x) > VaRβ (x)) . (2.2)
If there is no discontinuity in the distribution function of L(x) at VaRβ (x), then

equality holds in equation (2.2). For this reason, CVaR is also sometimes called
the Expected Tail Loss (ETL). When there is a discontinuity, as illustrated in [33],
CVaRβ (x) splits the probability atom at VaRβ (x) in a certain way. CVaR is
defined in this manner because it has an equivalent representation that is easily
optimized in the case of a discrete distribution such as in a scenario tree. This
representation will be referred to as Uryasev’s formula and is reviewed shortly.
2.2 Coherence
To help define a sensible risk measure, [3] introduces properties that are required of
a coherent risk measure; however, VaR does not satisfy these properties in general.
As is well known, VaR is not sub-additive: Examples have been constructed where
the VaR of the sum of two portfolios is greater than the sum of individual VaRs.
Lack of subadditivity is very undesirable because diversification is not promoted.
However, for the special class of elliptical distribution, VaR is sub-additive and
coherent [5].
The following properties of coherence are stated adhering to the axiomatic
definition in [1]. If V is the space of real-valued random variables, a risk measure
is a functional ρ : V −→ R. If the random variables v, v ′ ∈ V are thought of as
losses, then ρ is coherent if it is
i. sub-additive: ρ(v + v ′ ) ≤ ρ(v) + ρ(v ′ ),
ii. positive homogeneous: ρ(λv) = λρ(v), ∀λ ≥ 0,
iii. translation invariant: ρ(v + c) = ρ(v) + c, ∀c ∈ R, and
iv. monotonous: ρ(v) ≥ 0, ∀v ≥ 0.

Proof of the coherence of CVaR can be found in [2, 29, 33]. The coherence of the
set of random variables {L(x)} can be stated as a function of x when L(x) is
linear:
L(x) = x1 Y1 + ... + xn Yn .
In this situation, Yi might be a random variable representing an individual asset

loss, and L(x) is a random variable representing the total portfolio loss. Coherence
of CVaRβ (x) in this framework means
i. CVaRβ (x) is sublinear in x,
ii. CVaRβ (x) = c when L(x) = c ∈ R, and
iii. CVaRβ (x) ≤ CVaRβ (x′ ) when L(x) ≤ L(x′ ).
See [33] for the proof.

Note that sub-additivity and positive homogeneity guarantees that a coherent
risk measure is convex which is advantageous in portfolio optimization. A lack
of convexity of VaR contributes to numerical difficulties in optimization. VaR is
easy to work with when normality of distributions is assumed, but financial data
is typically heavy-tailed. We will also consider optimization under uncertainty
where discrete probability distributions arise from scenario trees. In addition to
coherence, CVaR has a representation that is practical in minimization problems
with scenarios generated from any distributional assumption.
2.3 Risk-Return Optimization
If the risky returns R are assumed to follow a multivariate normal distribution

N (µ, Σ), the portfolio return rp = ω ′ R is also normally distributed with mean
µp = ω ′ µ and variance σp2 = ω ′ Σω. The classical mean-variance optimization
problem is to minimize the risk of the portfolio for a minimum level of expected
return:
min ω ′ Σω
ω
s.t. ω ′ µ = µ0 , (2.3)
Pn
i=1 ωi = 1.
The solution to the above problem is easily solved with Lagrangian techniques
and can be found in [6]. As µ0 is varied, the set of portfolios trace out the mean-
variance efficient frontier. If no short-selling is allowed, the restriction ωi ≥ 0 is
also included.
A drawback of optimization problem (2.3) is that it requires a large number
of parameters to be estimated. If there are n risky assets, the covariance matrix
consists of n(n + 1)/2 elements. For instance, if the universe of assets consists of
the S&P500, over 125,000 variances/covariances must be estimated. A solution,
as found in [37], is to model each asset with a multifactor equation:
ri = µi + βi1 F1 + ... + βik Fk + ǫi , (2.4)
where Fj is the deviation of the random factor j from its mean and cov(Fj , Fl ) = 0
for all j 6= l. Examples of typical factors include inflation, interest rates, and
GDP. The asset specific risks ǫi have zero expectation, are uncorrelated, and are
independent of the factors. The portfolio rp = ω ′ R can be written as
k
X
rp = µ p + βpj Fj + ǫp ,
j=1
where
n
X n
X
′
µp = ω µ, βpj = ωi βij , ǫp = ωi ǫi .
i=1 i=1
It follows the variance of the portfolio is
k
X n
X
σp2 = 2 2
βpj σFj + ωi2 σǫ2i .
j=1 i=1
The first term in the right-hand side of this equation is the systematic or market
risk, and the second term is the unsystematic risk of the portfolio. If equal weight
is given to each asset, ωi = 1/n, the unsystematic risk is bounded by c/n for some
constant c, so this risk can be diversified away as n grows large. Using the factor
model in the minimum variance optimization problem gives:
Pk Pn
min σp2 = j=1
2 2
βpj σFj + i=1 ωi2 σǫ2i
ω
s.t. ω ′ µ = µ0 ,
βpj = ni=1 ωi βij
P
Pn
i=1 ωi = 1.
The factor sensitivities βij , factor variances, and specific risk variances can be
estimated through linear regression in equation (2.4). This results in a significant
reduction in the number of parameter estimates needed as compared to optimiza-
tion problem (2.3).
Both of the above are quadratic optimization problem. As an alternative
to mean-variance analysis, one can optimize the risk measures mentioned in the
previous section. Also in [37], the author illustrates that a linear optimization
problem can be achieved when the variance of the portfolio is replaced with its
mean-absolute deviation mp . Since R is multivariate normal, the relation holds
q
2
that mp = σ , so minimizing the mean-absolute deviation will produce the
π p
same optimal portfolio as minimizing the variance. In addition, the linear equiv-
alent program is easily modified to penalize upside and downside deviations from
the mean with different weights.
The class of elliptical distributions offers special properties in portfolio theory
that are useful in minimizing VaR or CVaR. The following gives a very brief
review; a more complete introduction to elliptical distributions and their portfolio
implications is found in [5]. For any elliptically distributed random vector R with
finite variance for all univariate marginals, variance is equivalent to any positive
homogeneous risk measure ρ. If rp = ω ′ R and r̃p = ω̃ ′ R are two linear portfolios
with corresponding variances σp2 and σ̃p2 :
ρ (rp − E(rp )) ≤ ρ (r̃p − E(r̃p )) ⇐⇒ σp2 ≤ σ̃p2 .
In addition if ρ is translation invariant, the solution to the following risk-return

optimization problems coincide:
min σp2 min ρ(rp )

ω ω
′
s.t. rp = ω R, s.t. rp = ω ′ R,
E(rp ) = µ0 , E(rp ) = µ0 ,
Pn Pn
i=1 ωi = 1, i=1 ωi = 1,
where µ0 is the desired return. Therefore, under this distributional assumption,

minimization of VaR, CVaR, or variance will produce the same optimal portfolios.
This follows because CVaR is always coherent, and VaR is coherent for this class
of distributions.
The stable assumption makes portfolio optimization more difficult since vari-
ance is infinite and cannot be used are a risk measure. One natural solution is
to use the scale parameter σpα of the portfolio return. The scale parameter is just
a generalization of the standard deviation of a normal distribution. Chapter 3
defines stable random vectors and the special case of a sub-Gaussian distribution,
which is also in the class of elliptical distributions. If Q is the dispersion matrix
of the sub-Gaussian distribution, it can be shown that the CVaR and VaR of the
portfolio return are both strictly increasing functions of the dispersion parameter
of the portfolio return ω ′ Qω. Therefore, for a sub-Gaussian random vector R,
minimization of VaR or CVaR can both be achieved by the portfolio optimization
problem:
min ω ′ Qω
ω
s.t. ω ′ µ = µ0 ,
Pn
i=1 ωi = 1.
Details of stable portfolio theory are found in [30], and a comparison of allocations
under the normal and stable assumptions is found in [27].
2.4 CVaR Optimization
One would like to be able to perform risk-return analysis for a portfolio by mini-
mizing VaR or CVaR subject to a constraint on the return for any distributional
assumption. In general, VaR is difficult to optimize and is usually not used in this
setting. Typically, one can model the returns with any distribution and then gen-
erate a discrete distribution of scenarios, but in this case, VaR is non-smooth and
non-convex in the portfolio positions with multiple local extrema [36]. CVaR, on
the other hand, has a representation that is easy to optimize both as a constraint
and as an objective for a set of scenarios. Additionally, if CVaR is constrained to
be small, VaR must necessarily be small. Conversely, minimization of VaR may
produce very different solutions than minimization of CVaR: VaR minimization
may stretch the tail of the distribution beyond VaR resulting in a poor CVaR
value.
2.4.1 Uryasev’s Optimization Shortcut
As defined earlier, for the decision x ∈ Rn , L(x) is the random variable repre-
senting the loss, or negative return, with associated VaRβ (x) and CVaRβ (x). To
begin, define the function
1
E [L(x) − ζ]+ ,

Γβ (x, ζ) = ζ + (2.5)
1−β
then CVaR is expressed as a minimization through the following optimization

shortcut: Γβ (x, ·) is finite and continuous with
CVaRβ (x) = min Γβ (x, ζ), (2.6)

ζ∈R
and, in addition,
VaRβ (x) = lower endpoint of argminζ Γβ (x, ζ).
Equation (2.6) will be referred to as Uryasev’s formula. As a corollary, it can be

shown that if L(x) is convex in x, then CVaRβ (x) is convex in x and Γβ (x, ζ) is
jointly convex in (x, ζ). In addition, if a constraint set X is convex, one obtains
a convex minimization problem in (x, ζ): Minimizing CVaRβ (x) with respect to
x ∈ X is equivalent to minimizing Γβ (x, ζ) with respect to (x, ζ) ∈ X × R, i.e.
min CVaRβ (x) = min Γβ (x, ζ) (2.7)

x∈X (x,ζ)∈X×R
The proofs of the above results are found in [33].

Similar to mean-variance efficient frontiers, [18] illustrates risk-reward analysis
using CVaR as a risk measure. If R(x) is a concave reward function and X is
convex, then
min CVaRβ (x) subject to R(x) ≥ λ, (2.8)
x∈X
min CVaRβ (x) − λR(x), and (2.9)

x∈X
min −R(x) subject to CVaRβ (x) ≤ λ, (2.10)

x∈X
produce the same efficient frontiers as λ is varied. As is already shown, the optimal
solution to (2.8) can be found by a joint convex optimization problem. Similarly,
problems (2.9) and (2.10) produce the same optimal solution as
min Γβ (x, ζ) − λR(x),

(x,ζ)∈X×R
and
min −R(x) subject to Γβ (x, ζ) ≤ λ,
x∈X
respectively.
An extension of these optimization procedures to risk shaping with CVaR is
found in [33]. For confidence level βi with corresponding loss tolerance λi , for
i = 1, ...I,
min −R(x) subject to CVaRβi (x) ≤ λi , for i = 1, ..., I,

x∈X
has the same optimal solution as
min −R(x) subject to Γβi (x, ζi ) ≤ λi , for i = 1, ..., I.

(x,ζ1 ,...,ζ2 )∈X×R×...×R
When L(x) has a discrete distribution arising from, for example, a scenario
tree or sampling, equation (2.5) becomes
S
1 X
Γ̃β (x, ζ) = ζ + pi [Li (x) − ζ]+ , (2.11)
1 − β i=1
where L(x) takes the value Li (x) with probability pi for i = 1, ..., S. Additionally
if L(x) is linear, then Γ̃β is convex and piecewise linear. By introducing auxiliary
variables, a CVaR optimization problem can be solved by linear programming as
illustrated in the next section.
Chapter 3
Portfolio Optimization and

Stochastic Programming
In this chapter we will continue the problem of portfolio optimization and intro-
duce the stochastic programming as a solution technique.
3.1 1-stage Portfolio Optimization
We will first consider the problem of a 1-stage portfolio optimization. Hereby,

we will apply Uryasev’s formula to risk-return analysis with CVaR and obtains a
linear programming problem.
Define ( )
n
X
X= ω ∈ Rn ωj = 1, ωj ≥ 0, j = 1, ..., n , (3.1)
j=1
where x ∈ X represents the portfolio weights in n assets. The random return on

these assets at the end of a time period is represented by R = (r1 , ..., rn )′ , and the
18
negative return of the portfolio is given by
L(x) = −x′ R.
If the mean of R is given by the vector µ, the risk-return problem is
min CVaRβ (x) s.t. x′ µ ≥ µ0 ,

x∈X
where µ0 is the required portfolio return, and by varying µ0 , the efficient frontier
is obtained. This optimization problem fits into the form of equation (2.8). If
the uncertainty in the return is given through the set of scenarios {R1 , ..., RS }
where each Rs ∈ Rn occurs with probability ps , Uryasev’s optimization shortcut
produces the equivalent problem
PS
min ζ+ 1
1−β s=1 ps [−x′ Rs − ζ]+
s.t. x′ µ ≥ µ 0 ,
x ∈ X, ζ ∈ R,
and by introducing auxiliary variables y s , s = 1, ..., S, a linear program results:
1
PS
min ζ+ 1−β s=1 ps y s
s.t. x′ µ ≥ µ 0 ,
x′ Rs + ζ + y s ≥ 0, s = 1, ..., S,
y s ≥ 0, s = 1, ..., S,
x ∈ X, ζ ∈ R.
This program is used to compare hedging strategies for international asset allo-
cation in [36]. In addition, the CVaR portfolio is compared with a portfolio min-
imizing the mean-absolute deviation. The empirical results indicate that CVaR
and MAD produce similar risk-return frontiers in a static setting. However, in
dynamic backtesting where the models are repeatedly applied over a time horizon,
CVaR produces higher returns and lower volatility than MAD.
3.2 Single-stage versus Multistage Optimization
Extending the single period risk-return problem to a multi-period setting is diffi-

cult and some modifications are necessary. In a multi-period setting, one usually
deals with a wealth process instead of returns so that problems will be convex and
sometimes linear. The general form of a stochastic program with recourse allows
any portfolio allocation to be made in each stage, and one typically optimizes a
function of the wealth process, not the return process, over the quantities of assets
held, not the portfolio weights. Instead of risk-return analysis, one can perform
risk-reward analysis where the risk, for instance, is a function of the wealth process
and the reward is the expected terminal wealth. This is the type of problem that
is constructed in the next chapter.
Decision rules such as fixed-mixed are useful because they reduce the decision
space, but they also limit the dynamic nature of the optimization problem. For
instance, one multi-period extension of mean-variance analysis is found in [20]:
max λE(wT ) − (1 − λ)var(wT ).
Here, wT is the terminal wealth, and the max is taken over all fixed-mixed decision
rules. In a fixed-mixed rule, the portfolio is reallocated in each time period to keep
a certain percentage of wealth in each asset. As λ is varied between zero and one,
a type of efficient frontier is obtained. While the number of decision variables are
greatly reduced, the problem becomes non-convex, and a global search algorithm
is necessary.
The coherence of a risk measure in a multi-period setting is also defined in
terms of a wealth process w = (w1 , ..., wT ) where w1 is a known deterministic
wealth. It is shown in [14] that a weighted average of CVaR over the time horizon
is coherent: If CVaRβ (−wt ) is the CVaR associated with the negative wealth −wt ,
then a coherent risk measure is given by
T
X
ρ(w) = ρ(w1 , ..., wT ) = µt CVaRβ (−wt ), (3.2)
t=2
where the weights are nonnegative and sum to one. Here, coherence means that
ρ is
i. convex: ρ(λw + (1 − λ)w̃) ≤ λρ(w) + (1 − λ)ρ(w̃), ∀λ ∈ [0, 1],
ii. positive homogeneous: ρ(λw) = λρ(w), ∀λ ≥ 0,
iii. translation invariant: ρ(w1 + c, ..., wT + c) = ρ(w) − c, ∀c ∈ R, and
iv. monotonous: if wt ≤ w̃t a.s. for t = 1, ..., T, then ρ(w) ≥ ρ(w̃).
When implementing the risk measure in (3.2), one can apply Uryasev’s optimiza-
tion shortcut in a similar manner as the previous sections: Uryasev’s formula
can be applied to each CVaRβ (−wt ) where the loss L is taken to be the negative
wealth −wt , and the wealth in each stage is a function of some decision variables.
Of course, there will also be constraints such as the balance of wealth between
stages. This is illustrated in detail in the next chapter for the surplus wealth in
an ALM problem.
3.3 Formulation of the Stochastic Program
Stochastic programming offers a framework that can incorporate many of the

characteristics of an ALM problem. We will first discuss a general setup for
stochastic programs with recourse. In the next chapter we will then apply this
framework to an ALM problem for a pension fund.
In a 2-stage recourse problem, a recourse decision is made after a realization
of uncertainty. The first stage has a vector of initial decisions x1 ∈ Rn1 made
at t = 1 when there is a known distribution of future uncertainty. The second
stage decisions x2 ∈ Rn2 adapt at t = 2 after the first stage uncertainty ξ1 is
realized. The second stage decisions usually also consider the distribution of
future uncertainty ξ2 realized after t = 2. For instance, consider an asset allocation
problem: The first stage decision is the initial portfolio allocation, the uncertainty
is the asset returns, and the recourse decision is the portfolio adjustments. This
2-stage recourse problem finds the optimal initial and rebalanced allocations for
the given distribution of future stock movements.
This setup is described mathematically by first considering how the optimal
recourse decision is determined. For a given first stage decision vector x1 and
a given realization of the first stage uncertainty ξ1 , the best recourse decision is
found through the following second stage problem:
minx2 q2 (x1 , x2 , ξ1 ) + Eξ2 ( Q2 (x1 , x2 , ξ1 , ξ2 )| ξ1 )

s.t. B2 (ξ1 )x1 + A2 (ξ1 )x2 = b2 (ξ1 ), (3.3)
l2 (ξ1 ) ≤ x2 ≤ u2 (ξ1 )
where
• q2 (x1 , x2 , ξ1 ) is a cost of decision x2 for the given realization of the first stage
uncertainty ξ1 and the given first stage decision x1 ,
• Q2 (x1 , x2 , ξ1 , ξ2 ) is the cost of decision x2 for given realizations of uncertain-
ties ξ1 and ξ2 and the given first stage decision x1 ,
• B2 (ξ1 ) is the technology matrix that converts a first stage decision into
resources in the second stage, and
• A2 (ξ1 ) is the recourse matrix.
It is possible to remove the cost function Q2 by including the second term of the
objective in the cost function q2 . The problem is said to have fixed recourse when
A2 is independent of ξ1 . The subscripts indicate at which t a value is known
except in the case of ξt . For instance, the realizations of B2 , A2 , and b2 are all
known at t = 2, which is the beginning of the second stage, but ξ2 is not realized
until after t = 2.
The full 2-stage recourse problem incorporates the second stage problem as
follows: With the optimal value of the second stage problem (3.3) denoted by
Q1 (x1 , ξ1 ), the 2-stage problem minimizes the sum of a first stage cost q1 (x1 ) and
the expected value of the second stage cost EQ1 (x1 , ξ1 ):
minx1 q1 (x1 ) + EQ1 (x1 , ξ1 )

s.t. A1 x1 = b1 , (3.4)
l1 ≤ x1 ≤ u1 .
The first set of constraints in the above problem are referred to as the first stage
constraints. A good introduction to the various properties of 2-stage recourse
problems, such as feasibility, is found in [4].
An obvious criticism of the 2-stage model is that it only allows one recourse
decision to be made, not a sequence of decisions over the time horizon. A multi-
stage recourse program can provide a more realistic model, but it is more complex
and can often be very difficult to solve numerically. As in the 2-stage problem,
the initial vector of decisions x1 is made before the first realization of uncertainty
ξ1 , and a second stage decision x2 is then made based on x1 and ξ1 . In the T -stage
problem, this process continues for the uncertainties ξt , t = 1, ..., T − 1, and the
decisions vectors xt , t = 1, ..., T . There is usually one additional realization of
uncertainty ξT following the final decision xT .
The T -stage recourse program can be defined recursively as an extension of the
2-stage program. Let the uncertainty up to and including stage t, for t = 1, ..., T ,
be denoted by ξ t = {ξj , j = 1, ..., t}, where each ξj is the uncertainty realized in
stage j. Similarly, let the decisions up to and including stage t be denoted by
xt = {xj , j = 1, ..., t}, where each xj is the decision made for stage j. The first
stage problem is essentially the same as problem (3.4):
minx1 q1 (x1 ) + Eξ1 Q1 (x1 , ξ 1 )

s.t. A1 x 1 = b 1 , (3.5)
l1 ≤ x1 ≤ u1 ,
with Qt , for t = 1, ..., T − 1, given by the minimization problems
Qt (xt , ξ t ) = minxt+1 qt+1 (xt+1 , ξ t ) + Eξt+1 ( Qt+1 (xt+1 , ξ t+1 )| ξ t )

s.t. Bt+1 (ξ t )xt + At+1 (ξ t )xt+1 = bt+1 (ξ t ), (3.6)
lt+1 (ξ t ) ≤ xt+1 ≤ ut+1 (ξ t ),
and QT (xT , ξ T ) is a known function, not the solution to another minimization

problem. It possible to set QT = 0 by including the second term of the objective
in qT . The above problem (3.5-3.6) is a form of the multistage recourse problem
that is relevant to the ALM problem that will be presented soon. Other forms,
such as that found in [13], allow the first constraint of (3.6) to depend on all
decisions up to time t:
t
X
Bt+1,τ (ξ t )xτ + At+1 (ξ t )xt+1 = bt+1 (ξ t ), (3.7)
τ =1
but this type of constraint is not necessary.
3.4 Scenario Generation
To numerically solve the recourse problem (3.5-3.6), the distribution of (ξ1 , ..., ξT )
is approximated by a set of scenarios usually organized in the form of a scenario
tree. Figure (3.1) contains an example of a small scenario tree similar to the
one that will be used in the 2-stage ALM problem discussed later. A first stage
optimal allocation is found in the node at t = 1, and optimal recourse allocations
are found in every node at t = 2. In the 2-stage problem, there is no additional
allocation decision made at the nodes at t = 3. The tree shown in the figure is
called balanced because each node at t = 2 is connected to two nodes at t = 3.
To describe the scenario tree, assume the nodes of the scenario tree are num-
bered starting with the value of one at t = 1, and let It be the number of nodes
up to and including those at t. Define the sets of indices It = {It−1 + 1, ..., It }, for
t = 2, ..., T + 1, with I1 = 1. A scenario s, which is a path through the scenario
tree, is then represented by the set of indices (i2 , ..., iT +1 ) where it ∈ It . Two
useful functions defined on the node indices are the predecessor, pred(·), and the
descendant, dec(·): pred(it ) returns the node in It−1 connected to it , and dec(it )
returns a subset of nodes in It+1 connected to it . At t, the probability of being
P
at node it ∈ It is denoted by p(it ) so that it ∈It p(it ) = 1. Sometimes it is
more useful to use the transition probabilities p(it , it+1 ), for it+1 ∈ dec(it ) where
P
it+1 ∈dec(it ) p(it , it+1 ) = 1.
Figure 3.1: A Scenario Tree.
A topic of active research examines how to generate a good set of scenarios to

represent the underlying distribution and produce good optimal decisions. The
simplest approach is to just generate a very large number of scenarios by sampling
from a time-series model. This is reasonable for a 1-stage problem, but recourse
problems quickly become too difficult or time-consuming to solve as the number of
scenarios is increased. Even with parallel implementations of solution algorithms,
multistage problems must typically limit the number of scenarios. In this case, it
becomes necessary to somehow generate a smaller set of “good” scenarios.
One technique in scenario selection is sequential importance sampling. The
general idea behind importance sampling is to obtain scenarios that are impor-
tant (in some sense) in the stochastic program. Sequential importance sampling
obtains these scenarios in an iterative fashion. First, scenarios are generated for
some given tree structure. The stochastic program is then solved and values for
the importance sampling criterion are obtain at each node. These nodal values de-
termine where the structure of the scenario tree should be changed and/or where
to resample a subtree. A more complete description of this method is in [12]. As
an example, the importance sampling criterion used in [10] is the expected value
of perfect information (EVPI). If the EVPI of a node is below some threshold, a
new subtree emanating from that node is generated by resampling. If the EVPI
is consistently below the threshold, the tree is collapsed beyond that node. If the
EVPI is above the threshold for a node with no descendants, the tree is expanded
beyond that node.
Discretization is an alternative to sampling from a distribution. One rela-
tively simple technique for discretization is moment matching. For instance, to
discretize the normal distribution it is possible to match the first two moments
with three symmetric points. A moment matching model for a two-dimensional
random vector is presented in [13] where the first and second random variables
may represent the first and second stage uncertainty, respectively. To obtain the
scenario values and probabilities, the first three marginal moments of both ran-
dom variables are matched with the corresponding moments of the approximate
distributions. In addition, the covariance between the true random variables is
matched with that of the approximations. If the number of desired scenarios is
large enough, and the moments are consistent, this procedure will provide a so-
lution. However, if the moments are inconsistent, the author suggests a weighted
least squares minimization problem.
As an alternative to moment matching, the discretization technique of [28]
relies on the minimization of transportation metrics to approximate a continuous
distribution with a discrete distribution. In this method, a desired scenario tree
structure has already been determined. The goal is to minimize the difference be-
tween the optimal value of the stochastic program with the true distribution and
the optimal value of the stochastic program with the approximate distribution.
This difference is termed the approximate error, and the author shows this error
can be bounded through the Fortet-Mourier distance between the true and ap-
proximate probability distributions. The algorithm for the optimal discretization
minimizes this bound. Through a simple 1-stage example, it is illustrated that
this method performs better (in the sense of minimizing the approximation error)
than moment matching.
Scenario reduction procedures can be used when a large number of scenarios
are already given. An approach involving moment matching is found in [7]. A
second approach involving probability metrics is found in [11] and [17]: Scenarios
are recursively deleted with redistribution of the probability among the remaining
scenarios by considering the Monge-Kantorovich functional.
There are many different methods to generate sample paths of the uncertain
data, and not all of them initially consider a tree structure. Sample paths may
come from an expert’s expectation, historical observations, or any time-series
model. The problem is then to convert a set of sample paths into a scenario tree.
The method of clustering is described in [13]: One can group similar first stage
values of the sample paths into clusters and then continue sequentially through
each stage, or one can use a multi-level scheme in which the clusters consider
the similarity of the entire sample path. A second method based on probability
metrics which converts sample paths into a tree structure by combining scenario
reduction with scenario bundling is found in [16].
3.5 Deterministic Equivalent Forms
The discrete and finite distribution of a scenario tree allows the stochastic re-
course problem to be written as a deterministic program. Once a scenario tree
is constructed, each node it of the scenario tree determines values for At (ξ t−1 ),
Bt (ξ t−1 ), bt (ξ t−1 ), lt (ξ t−1 ), ut (ξ t−1 ), and qt (·, ξ t−1 ) which are denoted by Ait , Bit ,
bit , lit , uit , and qit (·). The recourse problem (3.5-3.6) can then be written as
p(i2 )Qi2 (x1 )

P
minx1 q1 (x1 ) + i2 ∈I2
s.t. A1 x1 = b1 , (3.8)
l1 ≤ x1 ≤ u1 ,
with Qit , for it ∈ It , t = 2, ..., T , given by the minimization problems
Qit (xt−1 ) = minxt qit (xt ) + p(it , it+1 )Qit+1 (xt )

P
it+1 ∈dec(it )
s.t. Bit xt−1 + Ait xt = bit , (3.9)
lit ≤ xt ≤ uit ,
and QiT +1 can be taken to be equal to zero.

The above (3.8-3.9) is the form of the recourse program that will be relevant
when the solution method for the ALM problem is discussed later; however, there
are other ways to proceed. Two other deterministic forms are now mentioned so
that one can solve the ALM problem by possibly other solution algorithms. As
will be shown shortly, the ALM problem will have a piecewise linear objective
with linear constraints. By introducing auxiliary variables, the piecewise linear
problem can be converted into a fully linear problem (with potentially a huge
number of decision variables). In this case, the qit (·) will take the linear form:
qit (·) = hqit , ·i, (3.10)
where qit is now a vector of appropriate dimension.

The deterministic equivalent for the linear program in arborescent form care-
fully considers the structure of the scenario tree:
P P
min hq1 , x1 i + i2 ∈I2 p(i2 )hqi2 , xi2 i + · · · + iT ∈IT p(iT )hqiT , xiT i
subject to
A1 x1 = b1 ,
Bi2 x1 + Ai2 xi2 = bi2 , ∀i2 ∈ I2 ,
Bi3 xpred(i3 ) + Ai3 xi3 = bi3 , ∀i3 ∈ I3 ,
..
.
BiT xpred(iT ) + AiT xiT = biT , ∀iT ∈ IT ,
lit ≤ xit ≤ uit , ∀it ∈ It , t = 1, ..., T.
(3.11)
This arborescent form implicity includes non-anticipatory constraints that the
decision taken at t does not depend on the uncertainty that is realized in the
future. Note that the decision vectors are xit , it ∈ It , t = 1, ..., T , so there is one
decision for each node of the scenario tree except for those at T + 1.
The split-variable formulation is an equivalent form that lends itself to decom-
position and parallel implementation. If there are a total of S sample paths in
the scenario tree, S independent subproblems are created by allowing all decisions
to be scenario dependent. For the multistage case, the individual subproblem for
scenario s with nodes (i2 , ..., iT +1 ) is
min hq1 , xs1 i + hqi2 , xs2 i + ... + hqiT , xsT i

s.t. A1 xs1 = b1 ,
Bi2 xs1 + Ai2 xs2 = bi2 ,
Bi3 xs2 + Ai3 xs3 = bi3 , (3.12)
..
.
BiT xsT −1 + AiT xsT = biT ,
plus any upper and lower bounds on xst . When combining all subproblems into
one problem, non-anticipatory constraints must be explicitly considered in this
formulation: For any two scenarios s and s′ with a common path up to and
′
including t, xsj = xsj , for j = 1, ..., t, must be enforced. Essentially this amounts
to a 0 − 1 matrix of coefficients. If ps is the probability of scenario s, the overall
split-variable representation for the multistage program is
S
X
min ps (hq1 , xs1 i + hqi2 , xs2 i + ... + hqiT , xsT i) ,
s=1
subject to a set of constraints (3.12) for each s, the non-anticipatory constraints,

and any upper and lower bound constraints on xst . As [26] states, this representa-
tion is advantageous for algorithms that temporarily ignore the non-anticipatory
constraints.
Many multistage applications in finance can be posed as stochastic generalized
networks. This means that each scenario subproblem of the split-variable formu-
lation has a generalized network structure. Parallel implementation of highly
efficient network algorithms can provide substantial computational advantages;
however, some characteristics of a desired application, such as policy constraints,
may destroy the network structure. Additionally, the arborescent form does not
preserve any network structure present. Algorithms and computational studies of
stochastic generalized networks is found in the work of Mulvey and Vladimirou
[22–25]. See also [8], especially for parallel implementation.
Additional resources including solutions techniques for 2-stage and multistage
linear stochastic programs with recourse are in [13], [4], and [8].
3.6 The T-stage ALM problem
A specific ALM problem is now put into a form of a stochastic program with the
goal of finding the allocations over a time horizon in a set of assets that optimizes
a tradeoff between the risk and reward. The risk measure is a weighted average
of the CVaR of the negative surplus wealth at each stage, and the reward is the
expect final surplus wealth. Let the asset prices and liability price be denoted by
st and lt , respectively. There are n assets available at each time giving st ∈ Rn ,
and there is just one liability stream giving lt ∈ R. For the T -stage problem,
(st , lt ) are defined for t = 1, ..., T + 1. The current prices known today are (s1 , l1 ),
so these are not random variables; however, (st , lt ) is a bivariate random variable
with realizations in Rn+1 known at time t for t = 2, ..., T +1. The CVaR of interest
in stage t is just the CVaR of the distribution of the surplus wealth at t + 1. For
instance, the stage 1 CVaR is determined by the distribution of surplus wealth
at t = 2, which depends on the allocation decision at t = 1. For this reason, the
CVaR of interest in stage t is denoted as CVaRβ (−swt+1 ) where swt is the surplus
wealth at time t. The problem that will be solved can now be written as:
T
!
X
min λ µt CVaRβ (−swt+1 ) − (1 − λ)E(swT +1 ) (3.13)
t=1
s.t. an initial wealth constraint, (3.14)
balance of wealth constraints between time periods, and (3.15)
linear transaction costs. (3.16)

Other constraints may include bounds on positions invested in each asset, bounds
on the total transaction costs in each time period, and bounds permitting short-
selling; however, these are not included in the lecture.
The above problem does not directly fit into the form (3.5-3.6), but the de-
terministic equivalent can be put into form (3.8-3.9) with the help of Uryasev’s
formula for CVaR. To begin, assume a scenario tree has already been constructed
for (st , lt ):
(st , lt ) = (sit , lit ) with probability p(it ), ∀it ∈ It , t = 1, ..., T + 1. (3.17)
The deterministic equivalent of the optimization problem will determine optimal

asset allocations at each node of the scenario tree from t = 1 to t = T . These
allocations are decision variables in the stochastic program and are denoted by ait
for it ∈ It , t = 1, ..T . The distribution of swt+1 depends not only on (sit+1 , lit+1 ),
∀it+1 ∈ It+1 , but also on the allocation decisions made at the nodes at time t.
Note that this corresponds to the surplus wealth at time t+1 before the portfolio
reallocation occurs. The realization of the surplus wealth in node it+1 is therefore
a function of the allocation made in the node that immediately precedes it+1 .
With this allocation denoted by apred(it+1 ) , the distribution of the surplus wealth
for t + 1 = 2, ..., T + 1, is
swt+1 = hsit+1 , apred(it+1 ) i − lit+1 with probability p(it+1 ), ∀it+1 ∈ It+1 .
For the given scenario tree, Uryasev’s formula can now be applied to each CVaR:
1 X +
CVaRβ (−swt+1 ) = ζt + p(it+1 ) lit+1 − hsit+1 , apred(it+1 ) i − ζt ,
1−β it+1 ∈It+1
(3.18)
where there is one auxiliary variable ζt introduced for each stage. To simplify
things, let
+
hit+1 (ζt , apred(it+1 ) ) = lit+1 − hsit+1 , apred(it+1 ) i − ζt , and (3.19)
giT +1 (apred(iT +1 ) ) = hsiT +1 , apred(iT +1 ) i − liT +1 . (3.20)
The entire objective function is then

 
T T
 λµt
X X X
OBJ = λ µt ζt + p(it+1 )hit+1 (ζt , apred(it+1 ) )
t=1 t=1
1 − β it+1 ∈It+1
X
−(1 − λ) p(iT +1 )giT +1 (apred(iT +1 ) ). (3.21)
iT +1 ∈IT +1
Chapter 4
Modeling of the Risk Factors
An important task in ALM is the identification and adequate modeling of the un-
derlying risk factors. The dynamic of financial risk factors is well known to often
exhibit some of the following phenomena: heavy tails, skewness and high-kurtotic
residuals. The recognition and description of the latter phenomena goes back to
the seminal papers of Mandelbrot (1963) and Fama (1965). In this chapter we
will introduce the α-stable distribution as an extension of the normal distribu-
tion. Due to its summation stability and the fact that it generalizes the Gaussian
distribution, the class of α-stable distributions seems to be an ideal candidate to
describe the return distribution of the considered risk factors. For further de-
scription of the stable distribution and applications of the stable distribution in
financial theory see Samorodnitsky et al. (1994) or Rachev and Mittnik (1999).
35
4.1 Stable Distributions
4.1.1 Definition of Stable Random Variables
This section reviews some of the main features of the stable distribution as the
natural extension of the Gaussian distribution. The notion of stable distribution
was introduced in the 1920’s by P. Lévy. A stable distribution can be defined in
four equivalent ways, given in the following definitions: A random variable X
follows a stable distribution, if for any positive numbers A and B there exists a
positive number C and a real number D such that
AX1 + BX2 = CX + D (4.1)
where X1 and X2 are independent copies of X and ”=” denotes equality in distri-
bution.
Therefore, a distribution f is stable if it is invariant under convolution, i.e., if
there exist real constants C > 0 and D such that
Z +∞
f(AX1 +d1 )+(BX2 +d2 ) (s) := f (A(s − l) + d1 )f (Bl + d2 ) = f (Cs + D) (4.2)
−∞
for all real constants A, B > 0, d1 , d2 .

α is called the index of stability or characteristic exponent and for any stable
random variable X, there is a number α ∈ (0, 2] such that the number C in 4.1.1
satisfies the following equation:
C α = Aα + B α (4.3)
A random variable X with index α is called α–variable. Obviously the Gaussian

distribution has an index of stability of 2.
The next definition is equivalent to 4.1.1 and considers the sum of n inde-
pendent copies of a stable random variable. A random variable X has a stable
distribution if for any n ≥ 2, there is a positive real number Cn and a real number
Dn such that
d
X1 + X2 + ... + Xn = Cn X + Dn (4.4)
where X1 , X2 , ..., Xn are independent copies of X. Again, the number Cn and the
stability index of the distribution are closely linked and we get Cn = n1/α where
the α ∈ (0, 2] is the same as in equation 4.3.
The third definition of a stable distribution is a generalisation of the central
limit theorem. Stable distributions are in fact the only distributions that can be
obtained as limits of normalized sums of iid random variables. A random variable
X is said to be stable if it has a domain of attraction, i.e., if there is a sequence
of random variables Y1 , Y2 , ... and sequences of positive numbers {dn } and real
numbers {cn }, such that
Y1 + Y2 + · · · + Y n d
⇒ X. (4.5)
dn
The notation ⇒d denotes convergence in distribution. Definition 4.1.1 is obvi-

ously equivalent to definition 4.1.1, as one can take the Yi s to be independent and
distributed like X. As mentioned before, in the case of α = 2, the statement is
the ordinary central limit theorem. When dn = n1/α , the Yi s are said to belong
to the normal domain of attraction of X.
Finally, the last equivalent way to define a stable random variable provides
information about its characteristic function. A random variable X has a stable
distribution if there are parameters 0 < α ≤ 2, σ ≥ 0, −1 ≤ β ≤ 1, and µ real
such that its characteristic function has the following form:



 exp(−σ α |t|α [1 − iβsign(t) tan πα
2
] + iµt), if α 6= 1,

E(eiXt ) = (4.6)


exp(−σ|t|[1 + iβ π2 sign(t) ln |t|] + iµt),

if α = 1,

Definition 4.1.1 implies definition 4.1.1 what can be shown the following way:
For α 6= 1 and X1 , X2 , · · · , Xn independent copies of the stable random vari-
able X. Thus, we can write
it(X1 +X2 +···+Xn )

α α πα
Ee = exp −nσ |t| 1 − iβ(arg t) tan + inµt .
2
On the other hand, obviously
Eeit(cn X+dn ) = eitdn Eei(tcn )X =

πα
= eitdn exp −σ α |cn t|α 1 − iβ(arg cn t) tan + iµcn t .
2
By choosing cn = n1/α and dn = µ(n − n1/α ) we get the equation
Eeit(cn X+dn ) = Eeit(X1 +X2 +···+Xn )
Since the characteristic function is uniquely defined for a random variable X we

end up with the result:
d
X1 + X2 + · · · + Xn = cn X + dn .
4.1.2 Parameters and Special Cases of the Stable Distrib-
ution
A stable distribution is defined by four parameters. The dependence of a stable

random variable X from its parameters we will indicate by writing:
X ∼ Sα (β, σ, µ)
where α is the the so-called index of stability (0 < α ≤ 2). The lower the value of α
the more leptocurtic is the distribution. This can be considered as a very attractive
property for modeling financial asset returns. In empirical studies, the value of
α for asset returns is often chosen between 1 and 2. For α > 1, the location
parameter µ is the mean of the distribution. Figure 4.1 shows the probability
density function for symmetric alpha-stable random variables for different values
of α.
0.7
0.6
0.5
0.4
f(x)
0.3
0.2
0.1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
Figure 4.1: Probability density functions for standard symmetric α-stable random
variables, α = 2, α = 1 (dotted) and α = 0.5 (dashed).
The second parameter β is the skewness parameter (−1 ≤ β ≤ 1). A stable
distribution with β = µ = 0 is called a symmetric α-stable distribution (SαS). If
β < 0, the distribution is skewed to the left, if β > 0, the distribution is skewed to
the right. We conclude that the stable distribution can also capture asymmetric
asset returns.
σ is the scale parameter (σ ≥ 0) and µ is the drift (µ ∈ R).
0.35
0.3
0.25
0.2
f(x)
0.15
0.1
0.05
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
x
Figure 4.2: Probability density functions for stable random variables with α = 1.2,
β varying, β = 0, β = −0.5 (dashed) and β = −1 (dotted).
Figure 4.2 shows the probability density function for some skewed alpha-stable
random variables with α = 1.2.
Generally the probability density function of a stable distribution cannot be
specified in explicit form. However, there are three special cases where this is
possible:
i. The Gaussian distribution

If the index of stability α = 2, then the stable distribution reduces to the
Normal distribution, and it is S2 (σ, 0, µ) = N (µ, 2σ 2 ). We shall point out
that the σ in definition 4.1.1 is not equal to the standard deviation. When
2 t2 +iµt
α = 2, the characteristic function becomes EeitX = e−σ . This is
the characteristic function of a Gaussian random variable with mean µ and
variance 2σ 2 .
ii. The Cauchy distribution

S1 (σ, 0, µ), whose density f1 (x) is
σ
f1 (x) = (4.7)
π((x − µ)2 + σ 2 )
If X ∼ S1 (σ, 0, 0), then for x > 0,
1 x
P (X ≤ x) = 0.5 + arctan . (4.8)
π σ
iii. The Lévy distribution

S1/2 (σ, 1, µ), whose density
σ 1/2
1 σ
exp − (4.9)
2π (x − µ)3/2 2(x − µ)
is concentrated on (µ, ∞)
4.1.3 Properties of Stable Random Variables
In this section we will summarize some useful properties useful of stable distrib-
utions in modeling financial data or simulation.
The first property mentioned is the so-called summation stability. Let X1 , X2
be independent random variables with Xi ∼ (σi , βi , µi ), i = 1, 2. Then X1 + X2 ∼
Sα (σ, β, µ), with
β1 σ1α + β2 σ2α
σ= (σ1α + σ2α )1/α , β= , µ = µ1 + µ2 . (4.10)
σ1α + σ2α
for the proof we refer to Samorodnitsky et al. (1994). Thus, the sum of two
alpha-stable distributed random variables with the same index α is also alpha-
stable with the same index of stability α.
The second proposition concerns the parameter σ. The Gaussian distribution
can be scaled by multiplication with a constant. This property extends to 0 <
α ≤ 2.
Let X ∼ Sα (σ, β, µ) and let a ∈ R\{0}. Then
aX ∼ Sα (|a|σ, arg(a)β, aµ) if α 6= 1

2
aX ∼ Sα (|a|σ, arg(a)β, aµ − a(ln |a|σβ) if α = 1
π
The parameter σ is therefore often called the scale parameter. The proof of
4.1.3 can easily be done by using the characteristic function of stable distributions
πα
ln Eeit(aX) = −σ α |ta|α σ α 1 − iβ arg(ta) tan + iµ(ta)
2
πα
= −(σ|a|)α |t|α σ α 1 − iβ arg(a) arg(t) tan + i)(µa)t
2
.
The third proposition concerns the shift parameter µ. It was already discussed
that in the case of α = 2 the parameter µ is a shift parameter for the Gaussian
distribution. The same can be inferred about µ for any admissible α. Let
X ∼ Sα (σ, β, µ) and let a be real constant. Then X + a ∼ Sα (σ, β, µ + a).
This follows directly by interpreting a as a Sα (0, 0, a) stable random variable and
applying the summation stability proposition. For 1 < α ≤ 2, the shift parameter
µ equals the mean.
Finally, we can also interpret the last parameter β. It can be identified as a
skewness parameter. X ∼ Sα (σ, β, µ) is symmetric if and only if β = 0 and µ = 0.
It is symmetric about µ if and only if β = 0. We can proof this by the fact that
a random variable is symmetric if and only if its characteristic function is real.
By definition 4.1.1 this is the case if and only if β = 0 and µ = 0. The second
statement follows from property 4.1.3. In order to indicate that X is symmetric,
i.e. β = 0 and µ = 0, we write
X ∼ SαS
4.1.4 Truncated α-Stable Distributions
Despite these advantages the stable distribution so far is only rarely used in prac-
tical implementations. A major reason for the limited use of stable distributions
in applied work is that there are in general no closed-form expressions for its
probability density function. Numerical approximations are nontrivial and com-
putationally demanding. Another shortcoming in application issues is that all
moments of order ≥ α are infinite. Therefore, for some applications e.g. GARCH
models with conditions on the innovations like E(ǫt ) = 0 and V (ǫt ) = 1, t∈N
at first the stable distribution is not applicable. In the sequel, following Menn
and Rachev (2004) we will give a brief introduction to a new class of probability
distributions that combines the modeling flexibility of stable distributions with
the existence of arbitrary moments.
A possibility to guarantee the existence of moments of order ≥ α is to truncate
the stable distribution at certain limits and add two normally distributed tails to
the distribution. Dependent on where the truncation is conducted the distribution
can still be clearly more heavy-tailed than a normal distribution but may provide
finite variance. This idea leads to the definition of a so-called smoothly truncated
stable distribution.
Let gθ denote the density of some α-stable distribution with parameter-vector
Θ = (α, β, σ, µ) and hi , (i = 1, 2) denote the densities of two normal distributions
with parameters (νi , τi ), (i = 1, 2). Furthermore, let a, b ∈ R be two real numbers
with a ≤ µ ≤ b. The density of a smoothly truncated stable distribution (STS-
distribution) is defined by:

h (x) for x < a
 1



f (x) = gθ (x) for a ≤ x ≤ b



h2 (x) for x > b

In order to guarantee a well-defined continuous probability density, the following

relations are imposed:
(i) Continuity:
! !
h1 (a) = gθ (a) and h2 (b) = gθ (b)
(ii) P (R) = 1 and therefore
Za Za
!
p1 := h1 (x) dx = gθ (x) dx
−∞ −∞
Z∞ Z∞
!
p2 := h2 (x) dx = gθ (x) dx
b b
The class of smoothly truncated stable (STS) distributions in the following
will be denoted by S trunc , elements of S trunc by . Since probability distributions
used for modeling white noise processes like the innovations of a time series model,
are usually assumed to be standardized probability distributions with zero mean
and unit variance. It remains the problem of calculation of the parameters (νi , τi ),
(i = 1, 2) for the two normal distributions. The conditions lead to the following
equations for the parameters (νi , τi ), (i = 1, 2):
ϕ (Φ−1 (p1 ))
τ1 = and ν1 = a − τ1 Φ−1 (p1 ) (4.11)
gθ (a)
ϕ (Φ−1 (p2 ))
τ2 = and ν2 = b + τ2 Φ−1 (p2 ) (4.12)
gθ (b)
where ϕ and Φ denote the density and distribution function of the standard normal
distribution.
Following Menn and Rachev (2004) a useful property of α-stable distributions
is the scale and translation invariance, which is transmitted to the class of STS-
distributions:
[ã,b̃]
Y := cX + d ∼ Sα̃ (σ̃, β̃, µ̃) ∈ S trunc (4.13)
with
ã = ca + d, b̃ = cb + d, α̃ = α, σ̃ = |c|σ, β̃ = sign(c)β, µ̃ = cµ + d
The main advantage is however that the mean EX and the second moment
EX 2 of a STS-distributed random variable X exists:
EX = ap1 − τ1 Φ−1 (p1 )p1 + ϕ(Φ−1 (p1 )) +

Zb
+ xgθ (x) dx +
a
+bp2 + τ2 Φ−1 (p2 )p2 + ϕ(Φ−1 (p2 ))

(4.14)
EX 2 = (τ12 + ν12 )p1 − τ1 (a + ν1 )ϕ(Φ−1 (p1 )) +

Zb
+ x2 gθ (x) dx +
a
+p2 (ν22 + τ22 ) + τ2 (ν2 + b) · ϕ(Φ−1 (p2 )) (4.15)
where, as above, ϕ denotes the density and Φ the distribution function of the
standard normal distribution. p1 and p2 denote the cut-off-probabilities given in
equation (4.1.4) and Gθ is the distribution function of the α-stable distribution
with parameter-vector θ = (α, β, σ, µ).
It should be pointed out that since there exists no closed form expression for
the density gθ of a stable distribution, the mean and the variance of an STS-
distribution can only be calculated with the help of numerical integration.
4.2 Stable Modeling of Risk Factors
4.2.1 Modeling Financial Returns with Stable Distribu-

tions
In this section we will give some examples on the superior fit of stable distribu-
tions to financial returns compared to the Gaussian distribution that is used in
Distribution Stable Gaussian
Parameters alpha beta sigma mu µ σ
Unemployment Rate 1.6691 -1.0000 0.0124 0.0316 0.0337 0.0207
Working Output 1.4474 1.0000 0.0723 0.0234 -0.0074 0.1454
Gross Domestic Product 1.6325 -1.0000 0.0830 -0.0493 -0.0495 0.1986
Consumer Price Index 1.2061 0.0880 0.3130 0.0385 0.0194 0.8058
Annual Saving 1.2849 1.0000 0.0123 0.0563 0.0433 0.0283
Personal Income 2.0000 0.1427 0.0147 0.0744 0.0744 0.0210
Table 4.1: Parameters of α-stable and Gaussian fit to log-returns of several US

macroeconomic time series 1960-2000.
most standard models of financial theory. Various applications of stable models

in finance can be found in Rachev and Mittnik (1999). The advantages of the
alpha-stable distribution for modeling financial data are manifold. Due to the
summation stability the sum of stable distributed random variables with identical
parameter α are again alpha-stable distributed with α.
Another advantage is the number of parameters: with four parameters the
distribution provides more flexibility and is capable to explain issues of financial
data like skewness, excess kurtosis or heavy tails.
Figure 4.3 and 4.4 illustrate that in most cases stable distributions provide a
clearly better fit for financial variables since they can capture the kurtosis and
the heavy-tailed nature of financial data. We considered the fit of the stable
distribution to the yearly log-return time series of the macroeconomic variables
working output per hour and GDP. Table 4.2.1 provides the goodness-of-fit mea-
sure Kolmogorov distance (KS) that measures the distance between the empirical
cumulative distribution function Fn (x) and the fitted CDF F (x)
KS = max |Fn (x) − F (x)|. (4.16)

x∈R
Title
4
Empirical Density
Stable Fit
Gaussian (Normal) Fit
3.5
2.5
1.5
0.5
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Figure 4.3: Normal and Stable fit to log return of Working Output per hour.
Title
1
Empirical Density
Stable Fit
0.9 Gaussian (Normal) Fit
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 4.4: Normal and Stable fit to log return of GDP.
A short-coming of the Kolmogorov distance is that it is most sensitive around

the median value and less sensitive in the tails of the distribution, where F (x) is
Figure 4.5: Fit of Gaussian and Stable distribution to 1 year Euribor rate
near 0 or 1. Therefore, we also considered the Anderson-Darling statistic (AD)
|Fn (x) − F (x)|

AD = max p (4.17)
x∈R F (x)(1 − F (x))
that puts more weight to the tails of the distribution. Table 4.2.1 shows the
results for the considered goodness-of-fit criteria. For most variables we find a
clearly better fit of the stable distribution compared to the normal.
4.3 Univariate and Multivariate Distributions
For ALM problems, often scenarios are generated by calibrating and simulating a
time-series model to multivariate data.
There are two major approaches modeling multivariate data:
Figure 4.6: Fit of Gaussian and Stable distribution to residuals of monthly infla-
tion
Distribution Stable Gaussian

Parameters KS AD KS AD
Unemployment Rate 0.0843 0.5065 0.1333 0.4077
Working Output 0.0965 0.2869 0.1791 0.3389
Gross Domestic Product 0.0804 0.5448 0.1804 6.1008
Consumer Price Index 0.0723 0.2646 0.1833 0.5044
Annual Saving 0.0777 0.1738 0.1635 0.3521
Personal Income 0.1073 0.1864 0.0783 0.1920
Table 4.2: Goodness-of-Fit criteria Kolmogorov distance (KS) and Anderson-

Darling statistic (AD) for Stable and Normal Distribution.
• Fit a multivariate distribution.
• Fit each individual time-series with a univariate distribution and use a cop-
ula to describe the dependence structure.
The second approach is more flexible in the sense that it allows any type of
distribution to be fit to the individual series. For instance, one can first calibrate
complex univariate models like GARCH etc. and then capture the dependence
with a time-varying copula.
4.4 Fitting a Multivariate Distribution
In terms of the multivariate approach one might calibrate a vector autoregressive

(VAR) model to the data. The VAR model has had much success in modeling
financial and economic data. The general VAR(p) model for financial return data
R̃τ is
R̃τ = Π1 R̃τ −1 + ... + Πp R̃τ −p + Eτ , (4.18)
where the innovations process Eτ = (e1τ , ..., e6τ )′ is assumed to be white noise with
covariance matrix Σ. It is both easy to calibrate and easy to simulate scenarios
from VAR models. An introduction to modeling and estimation of VAR models
can be found in [38].
To simulate the VAR model, one needs to make a distributional assumption
for the innovations. After estimation of the VAR(1) model, the residuals are
computed by
Êτ = R̃τ − Π̂1 R̃τ −1 , (4.19)
and the standardized residuals Σ̂−1/2 Êτ are plotted in figure (5.2). The usual
assumption is that the innovations are Gaussian, in which case the standardized
residuals should be i.i.d. multivariate Normal(0,In ). However, based on the results
on financial return data of the previous sections, it might also be promising to
use a more flexible or heavy-tailed distribution like the α-stable or the truncated
stable distribution.
A n-dimensional random vector Z has a multivariate stable distribution if for
any a > 0 and b > 0 there exists c > 0 and d ∈ Rn such that
d
aZ1 + bZ2 = cZ + d,
where Z1 and Z2 are independent copies of Z and aα + bα = cα . The characteristic

function of R is given by
 n R o
′ ′ πα ′

 exp −

|θ s| 1 − isign(θ s) tan 2 ΓZ (ds) + iθ µ , if α 6= 1,
S
ΦZ (θ) = n R n o
′ 2 ′ ′ ′

 exp −
 |θ s| 1 + i π sign(θ s) ln |θ s| ΓZ (ds) + iθ µ , if α = 1,
Sn
where θ and µ are n-dimensional vectors. The spectral measure ΓZ is a finite

measure on the sphere in Rn that replaces the roles of β and σ in stable random
variables. Again, α and µ are the index of stability and location parameter,
respectively. A symmetric stable random vector with µ = 0 is called symmetric
alpha-stable (SαS), and in this case, the stable equivalent of covariance is the
covariation:
Z
1 2
hα−1i
z̃ , z̃ α
= s1 s2 Γ(z̃1 ,z̃2 ) (ds),
S2
where (z̃ 1 , z̃ 2 ) is a SαS vector with spectral measure Γ(z̃1 ,z̃2 ) and y hki = |y|k sign(x).
Additionally, the covariation norm is given by
i i 1/α
kz̃ i kα = z̃ , z̃ α .
See [30] for details on estimating the index of stability, spectral measure, and scale
parameter for a general stable random vector.
4.5 Dependence Modeling and Copulas
In the elliptical world the variance-covariance approach to optimizing portfolios

makes sense and VAR is a coherent measure of risk here. For this reason, the class
of elliptical distributions represent an ideal environment for standard (market) risk
managing approaches. However, for general multivariate distributions, correlation
often gives no indication about the degree or structure of dependence here. A list
of deficiencies and problems in the general case shall illustrate this point (see
Embrechts et al, 1999):
i. Correlation is simply a scalar measure of dependency; it cannot tell us every-

thing we would like to know about the dependence structure of risks.
ii. Possible values of correlation depend on the marginal distribution of the

risks. All values between -1 and 1 are not necessarily attainable.
iii. Perfectly positively dependent risks do not necessarily have a correlation of

1; perfectly negatively dependent risks do not necessarily have a correlation
of -1.
iv. A correlation of zero does not indicate independence of risks.
v. Correlation is not invariant under transformations of the risks. For example,

log(X) and log(Y ) generally do not have the same correlation as X and Y .
vi. Correlation is only defined when the variances of the risks are finite. It is
not an appropriate dependence measure for very heavy-tailed risks where
variances appear infinite.
For an illustration of point 2 and 4, consider the following example (see Embrechts
et al, 1999):
Consider two rv’s. X and Y that are lognormally distributed with µX = µY = 0,
σX = 1 and σY = 2. One can show that by an arbitrary specification of the joint
distribution with the given marginals, it is not possible to attain any correlation
in [−1, 1]. In fact, there exist boundaries for a maximal and a minimal attainable
correlation [ρmin , ρmax ] which in the given case is [−0.090, 0.666].
Allowing σY to increase, this interval becomes arbitrarily small as one can see
in figure 4.7. Here, it is interesting to note that the two boundaries represent the
case where the two rv’s are perfectly positive dependent (the max. correlation
line) or perfectly negative dependent (the min. correlation line) respectively.
Thus although the attainable interval for ρ as σY > 1 converges to zero from
both sides, the dependence between X and Y is by no means weak. This indicates
that it is wrong to interpret small correlation as weak dependence.
A single statistical parameter like the linear correlation coefficient will not
be able to capture the entire dependence structure between two rv’s in the gen-
eral case. At this point a general concept of describing the dependence structure
within multivariate distributions is needed. Since marginal distributions are very
illustrative, easy to handle and often used as basic building blocks for the de-
sign of a multivariate distribution, the idea of separating the description of the
joint multivariate distribution into the marginal behaviour and the dependence
structure is very attractive. One representation of the dependence structure that
satisfies this concept is a copula. A copula is a function that combines the mar-
ginal distributions to form the joint multivariate distribution. A copula is the
distribution function of a random vector in Rn with standard uniform marginals.
One can alternatively define a copula as a function and impose certain restrictions.
Figure 4.7: Maximum and minimum attainable correlation for X ∼
Lognormal(0, 1) and Y ∼ Lognormal(0, sigma).
A copula is any real valued function C : [0, 1]n → [0, 1], i.e. a mapping of the unit
hypercube into the unit interval, which has the following three properties:
i. C(u1 , . . . , un ) is increasing in each component of ui .
ii. C(1, . . . , 1, ui , 1, . . . , 1) = ui for all i ∈ {1, . . . , n}, ui ∈ [0, 1].
iii. For all (a1 , . . . , an ), (b1 , . . . , bn ) ∈ [0, 1]n with ai ≤ bi :
2
X 2
X
··· (−1)i1 +···+in C(u1i1 , . . . , unin ) ≥ 0
i1 =1 in =1
where uj1 = aj and uj2 = bj for all j ∈ {1, . . . , n}.
Let X = (X1 , . . . , Xn )′ be a random vector of real-valued rv’s whose dependence

structure is completely described by the joint distribution function
F (x1 , . . . , xn ) = P (X1 < x1 , . . . , Xn < xn ). (4.20)
Each rv Xi has a marginal distribution of Fi that is assumed to be continuous for

simplicity. Recall that the transformation of a continuous rv X with its own distri-
bution function F results in a rv F (X) which is standardly uniformly distributed.
Thus transforming equation (4.20) component-wise yields
F (x1 , . . . , xn ) = P (X1 < x1 , . . . , Xn < xn )
= P [F1 (X1 ) < F1 (x1 ), . . . , Fn (Xn ) < Fn (xn )]
= C(F1 (x1 ), . . . , Fn (xn )), (4.21)
where the function C can be identified as a joint distribution function with stan-
dard uniform marginals — the copula of the random vector X. In equation (4.21),
it can be clearly seen, how the copula combines the magrinals to the joint distri-
bution.
Sklar’s theorem provides a theoretic foundation for the copula concept:1 [Sklar’s
theorem] Let F be a joint distribution function with continuous margins F1 , . . . , Fn .
Then there exists a unique copula C : [0, 1]n → [0, 1] such that for all x1 , . . . , xn
in R = [−∞, ∞] (4.21) holds. Conversely, if C is a copula and F1 , . . . , Fn are
distribution functions, then the function F given by (4.21) is a joint distribution
function with margins F1 , . . . , Fn . For the case that the marginals Fi are not
all continuous, it can be shown2 that the joint distribution function can also be
expressed like in equation (4.21), although C is no longer unique in this case.
Examples of copulas
i. If the rv’s Xi are independent, then the copula is just the product over the
Fi
C ind (x1 , . . . , xn ) = x1 · · · · · xn .
ii. The Gaussian copula is
Φ−1 (x)Z Φ−1 (y)

1 −(s2 − 2ρst + t2 )
Z
CρGa (x, y) = exp dsdt,
2(1 − ρ2 )
p
−∞ −∞ 2π (1 − ρ2 )
where ρ ∈ (−1, 1) and Φ−1 (α) = inf{ x | Φ(x) ≥ α} is the univariate inverse
standard normal distribution function. Applying CρGa to two univariate
standard normally distributed rv’s results in a standard bivariate normal
distribution with correlation coefficient ρ.
1
For further discussion see [35].
2
See [35].
Note that, since the copula and the marginals can be arbitrarily combined,
this (and any other) copula can be applied to any set of univariate rv’s.
The outcome will then surely not be multivariate normal, but the resulting
multivariate distribution has inherited the dependence structure from the
multivariate normal distribution.
iii. As a last example, the Gumbel or logistic copula
n oβ
1 1
CβGu (x, y) = exp − (− log x) + (− log y)
β β ,
where β ∈ (0, 1] indicates the dependence between X and Y . β = 1 gives

independence and β → 0+ leads to perfect dependence.
According to theorem 4.5, a multivariate distribution is fully determined by its

marginal distributions and a copula. Therefore, the copula contains all informa-
tion about the dependence structure between the associated random variables. In
the case that all marginal distributions are continuous, the copula is unique and
therefore often referred to as the dependence structure for the given combination
of multivariate and marginal distribution. If the copula is not unique because at
least one of the marginal distributions is not continous, it can still be called a
possible representation of the dependence structure.
A very useful feature of a copula is the fact that it is invariant under increasing
and continuous transformation of the marginals. If (X1 , . . . , Xn )t has copula C
and T1 , . . . , Tn are increasing continuous functions, then (T1 (X1 ), . . . , Tn (Xn ))t
also has copula C. The proof can be found for example in [15], page 6.
One application of lemma 4.5 would be that the transition from the represen-
tation of a random variable to its logarithmic representation does not change the
copula. Note that the linear correlation coefficient does not have this property,
Figure 4.8: 1000 draws from two distributions that were constructed using
Gamma(3,1) marginals and two different copulas, both having a linear correla-
tion of ρ = 0.7.
since it is only invariant under linear transformation.

From the concept of a copula, it is immediately clear that the easiest way to
construct a multivariate distribution using a copula is to assume some marginal
distributions and apply the copula. Below there are some examples for illustration
purposes.
A practical problem, however, will be set up the other way round: The multi-
variate distribution has to be estimated by fitting the copula to data. A discussion
of this topic is beyond the scope of this lecture.
(3 ) Let X and Y be two rv’s that are both identical gamma (3,1) distributed.
Now we apply two different copulas and compare the characteristics by simu-
lating 1000 bivariate draws from both models. First, we use a Gaussian copula
3
The example and the graph were taken from [15], page 2 and 22f.
with parameter ρGa = 0.7. The second distribution is then derived by applying a
Gumbel copula whose parameter β is adjusted in a way that the linear correlation
coefficient for the resulting bivariate distribution is also ρGu = 0.7.
Figure 4.8 shows the scatter plot of the 1000 draws for both distributions.
The 99% quantile q0.99 for the marginal Gamma distribution has been added as
an indicator line for extreme values.
Note that despite the fact that both distributions have the same linear corre-
lation coefficient, the dependence between X and Y is obviously quite different in
both models. Using the Gumbel copula, extreme events have a tendency to occur
together, as one can observe by comparing the number of draws where x and y
exceed q0.99 simultaneously. Those are 12 for the Gumbel and 3 for the Gaussian
case.
Additionally, the probability of Y exceeding q0.99 given that X has exceeded
q0.99 can be roughly estimated from the figure:
3
P̂Ga (X > q0.99 |Y > q0.99 ) = = 0.3̄
9
12
P̂Gu (X > q0.99 |Y > q0.99 ) = = 0.75
16
This is another indicator for the increased probability for the joint occurrence of
extreme events.
In the previous section we considered a bivariate distribution to show that
marginal distributions and correlation are insufficient information to fully specify
the joint distribution. This example was constructed in the following way, using
a copula:
Let X and Y be two rv’s with standard normal distributions. Obviously the
outcome for the bivariate distribution when applying an arbitrary copula is not
bivariate normal in general. This is only the case when choosing the Gaussian
copula C = CρGa .
Thus, the following copula has been constructed:
2γ − 1
f (x) = 1{(γ,1−γ)} (x) + 1{(γ,1−γ)c } (x)
2γ
2γ − 1
g(y) = −1{(γ,1−γ)} (y) − 1{(γ,1−γ)c } (y)
2γ
with γ ∈ [ 41 , 12 ]. For γ < 21 , the joint density disappears on the square [γ, 1 − γ]2
such that the joint distribution is surely not bivariate normal. However, the linear
correlation coefficient between X and Y exists. From symmetry considerations
(C(u, v) = c(1 − u, v), 0 ≤ u, v ≤ 1) it can be deducted that ρX,Y = 0, irrespective
of γ. Therefore, uncountably many bivariate distributions with standard normal
marginals and zero correlation exist that are not bivariate normal.
Chapter 5
ALM Implementation
In the following chapter an empirical example of ALM is provided. Hereby, the

T-stage ALM problem of section 3.6 will be applied to data that is representative
of a defined-benefit pension fund. A liability index lτ provided by Ryan Labs is
used as a proxy for the liabilities. This is a generic index that does not correspond
to the liabilities of a specific corporate defined-benefit plan, but this index helps
to illustrate the current predicament of pension funds in [34]. The same reference
also provides the typical asset classes invested in by pension funds: cash, bonds,
equity, real estate, international stocks, international bonds, mortgage, GIC’s and
annuities, and private equity. Table (5.1) contains the benchmarks used for the
asset classes including bonds, equities and international equities.
Given the historical data for the liability index lτ and asset indexes siτ , i =
1, ..., 5 a multivariate scenario tree can be constructed. Recall that this is achieved
62
Asset Class Benchmark
s1 Cash Ryan Labs Cash Index
s2 Bonds Lehman U.S. Aggregate Bond Index
s3 Equities S&P500
s4 International Equities Morgan Stanley EAFE Index
s5 Mortgages Lehman Mortgage Index
Table 5.1: Benchmarks for the pension fund asset classes.
by fitting a multivariate time-series model to the return vector:

   
 rτ1   lτ /lτ −1 − 1 
   
 rτ2   s1 /s1 − 1
  τ τ −1

Rτ =  = . (5.1)
 
.. ..

 .  
  . 

   
rτ6 s5τ /s5τ −1 − 1
Once a time-series model is found, it is simple to generate sample paths for the
returns and then convert the returns back to index values. Note that in our
example τ is interpreted as time, and in the previous chapter, t is interpreted as
the stage in a stochastic program. It is possible that they will coincide; however,
there will usually be many smaller time periods between stages. In our application,
a time-series model is fit to monthly data, but a stage covers a 6-month period.
Figure (5.1) contains the plots of the monthly returns for the components
of Rτ . There are 237 data points corresponding to the returns for the months
of April 1985 to December 2004. An obvious characteristic of the data is the
volatility clustering, especially noticeable in the equity index. This indicates that
a time-series model with time-varying volatilities is appropriate.
Monthly Liability Returns
0.2
0.1
−0.1
1987 1990 1993 1995 1998 2001 2004
Monthly Cash Returns

0.015
0.01
0.005
0
1987 1990 1993 1995 1998 2001 2004
Monthly Bond Returns
0.04
0.02
−0.02
−0.04
1987 1990 1993 1995 1998 2001 2004
Monthly Equity Returns

0.2
0.1
−0.1
−0.2
1987 1990 1993 1995 1998 2001 2004
Monthly International Equity Returns

0.2
0.1
−0.1
−0.2
1987 1990 1993 1995 1998 2001 2004
Monthly Mortgage Returns

0.06
0.04
0.02
−0.02
−0.04
1987 1990 1993 1995 1998 2001 2004
Figure 5.1: Monthly returns Rτ for April 1985 to December 2004.

5.1 Finding an adequate model
As a first step in fitting a model to the data, the major trends of the individual
series are removed by an exponentially weighted moving average (EWMA) process
for the mean. The means of the univariate return series are assumed to follow:
miτ = λm miτ −1 + (1 − λm )rτi −1 , for i = 1, ..., 6, (5.2)
where λm is a fixed parameter. By writing mτ = (m1τ , ..., m6τ )′ , the new return
series of interest is
R̃τ = Rτ − mτ , (5.3)
and as the next step, a vector autoregressive (VAR) model is calibrated to R̃τ .
For the data at hand, the AIC indicates that the VAR of order 1 is optimal.
More generally, one may fit a multivariate autoregressive moving average
(ARMA) model, however, multivariate financial data typically indicates only an
autoregressive component, so it is reasonable to restrict the model to VAR. Ex-
tensions of the VAR model that additionally includes economic regime changes
and long term equilibria in an ALM context may be used as well.
To find the optimal value of λm , a course grid was created, and for each element
in the grid, the AICs of low order VAR models were compared. VAR(1) always
resulted in the lowest AIC for any value of λm in the grid. A fine grid for λm
was then constructed, and the AICs of the corresponding VAR(1) models were
compared. This procedure gave an optimal value of λm = 0.952.
After estimation of the VAR(1) model, the residuals are computed by
Êτ = R̃τ − Π̂1 R̃τ −1 , (5.4)

and the standardized residuals Σ̂−1/2 Êτ are plotted in figure (5.2). The usual
assumption is that the innovations are Gaussian, in which case the standardized
residuals should be i.i.d. Normal(0,I6 ). This is clearly not the case because there
is still a significant amount of volatility clustering and extreme events. The stan-
dardized residuals are aggregated into one series, and the corresponding qq-plot
versus the standard normal is found in figure (5.3).
To get an idea of the variability and dependence structure of the innovations
for the VAR(1) model, the estimated volatilities σ̂ i from the the univariate series
êi = {êiτ , τ = 1, ..., 237}, where each êiτ is a component of Êτ , are
σ̂ 1 σ̂ 2 σ̂ 3 σ̂ 4 σ̂ 5 σ̂ 6
0.0404 0.0015 0.0124 0.0450 0.0494 0.0105
and the estimated correlation of Eτ is

 
1.0000 0.5176 0.9343 0.1734 0.0652 0.8134
 
 

 0.5176 1.0000 0.6261 0.0303 −0.0332 0.6168 

 
 0.9343 0.6261 1.0000 0.1792 0.0780 0.9350 
CorrE =  .
 

 0.1734 0.0303 0.1792 1.0000 0.5933 0.2026 

 
0.0652 −0.0332 0.0780 0.5933 1.0000 0.0874
 
 
 
0.8134 0.6168 0.9350 0.2026 0.0874 1.0000
The first noticeable point is that the volatilities corresponding to the equity returns
are the largest, the volatility corresponding to the bond returns is smaller, and
the volatility corresponding to the cash returns is very small. Also, the volatility
corresponding to the liability returns is almost as large as that of the equities,
meaning that the liabilities of pension funds are actually quite risky. The second
noticeable point is that the liability returns and bond returns are highly correlated
as one would expect. This means that when the optimization program is solved
Series 1
4
−2
−4
0 50 100 150 200
Series 2
5
−5
0 50 100 150 200
Series 3
4
−2
−4
0 50 100 150 200
Series 4
4
−2
−4
−6
0 50 100 150 200
Series 5
6
−2
−4
0 50 100 150 200
Series 6
4
−2
−4
0 50 100 150 200
Figure 5.2: Standardized residuals Σ̂−1/2 Êτ of the VAR(1) model for Rτ .
qq−plot
6
quantiles of the standardized residuals

2
−2
−4
−6
−4 −3 −2 −1 0 1 2 3 4
quantiles of the standard normal
Figure 5.3: QQ-plot of the standard normal versus the standardized residuals
Σ̂−1/2 Êτ .
ê1 ê2 ê3 ê4 ê5 ê6

i
α̂ 1.8569 1.7411 1.9900 1.8727 1.9702 1.8096
σ̂αi 0.0263 0.0008 0.0087 0.0285 0.0343 0.0067
Table 5.2: Univariate ML estimates of the tail index and scale parameters for each
residual series êi .
for the minimum risk portfolio, one could expect a large allocation in the bonds
to offset the risk in the liabilities.
A symmetric stable distribution is fit to each of the univariate residual series
of the VAR(1) model by maximum likelihood estimation. The estimates of the
tail index α̂i and scale parameter σ̂αi from each univariate series êi is given table
(5.2). The estimation was restricted to symmetric distributions because of the
short length of the data series. Alternatively, it is reasonable to assume that
α = 1.8 for financial data and carry out the estimation for the scale parameter
alone. The empirical density of the liability return innovations is compared to
Density functions
11
empirical
10 stable
normal
0
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2
Figure 5.4: Density functions for the residuals of the liability return series.
ê1 ê2 ê3 ê4 ê5 ê6

Normal 0.0505 0.0685 0.0578 0.0673 0.0422 0.0509
Stable 0.0332 0.0291 0.0572 0.0677 0.0418 0.0569
Table 5.3: Comparison of KD under the normal and stable assumptions.
both the estimated normal density and the estimated stable density in figures
(5.4) and (5.5). As is seen, the stable density better matches the peak of the
empirical density and has a slower decay at the tails than that of the normal
density.
Two goodness-of-fit measures are employed to compare the normal fit and
the stable fit of the univariate series: the Kolmogorov distance (KD) and the
Anderson-Darling (AD) statistic. The KD and AD for the normal and stable
estimated distributions for each of the series can be found in tables (5.3) and
(5.4). The normal fit slightly outperforms the stable fit twice under the KD
measure, but the stable fit is clearly superior under the AD measure.
A sub-Gaussian distribution can be fitted to the residuals Eτ of the ALM
Density functions
2
empirical
1.8 stable
normal
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0.05 0.1 0.15 0.2
Figure 5.5: Right tail of the density functions for the residuals of the liability
return series.
ê1 ê2 ê3 ê4 ê5 ê6

Normal 0.6546 45.9484 0.1674 12.4620 0.1965 0.3848
Stable 0.1116 0.0947 0.1523 0.1366 0.0856 0.1151
Table 5.4: Comparison of AD under the normal and stable assumptions.
data. First, α is estimated and the univariate estimates from table (5.2), yielding
α̂ = 1.8705. Assuming the residuals have zero mean, a moment estimator for Q in
equations is applied to Z̃τ = Êτ with p = α̂/3 and q = 1. The resulting moment
estimates of the scale parameters q̂jj are:
q̂11 q̂22 q̂33 q̂44 q̂55 q̂66

0.0257 0.0008 0.0080 0.0282 0.0319 0.0066
They can be compared with the ML estimates in table (5.2). The moment es-
timate for Q given by the above equations is not symmetric, but a symmetric
estimate is given by Q̂ = (q̂ij2 + q̂ji
2
)/2 . The standardized residuals Q̂−1/2 Êτ

are also computed and are plotted in figure (5.6). In this case, the data points
should all be temporally and serially independent realizations of a S1.8705 (1, 0, 0)
random variable. This is clearly not the case because there is a significant amount
of volatility clustering. The qq-plot of the stable random variable versus the ag-
gregated standardized residuals is found in figure (5.7). This plot appears closer
to linear than the qq-plot with the standard normal in (5.3), which indicates the
sub-Gaussian provides a better fit than the multivariate normal; however, neither
of these capture the time-varying nature of the innovations.
5.1.1 Exponentially Weighted Moving Average

Models
To account for the volatility clustering, different types of models are implemented:
The first assumes the innovations are Gaussian with a time-varying covariance
matrix, and the second assumes the innovations are sub-Gaussian with a time-
varying dispersion matrix.
Given a multivariate data set {Eτ , τ = 1, ..., τm } with zero mean, the sample
Series 1
6
−2
−4
−6
0 50 100 150 200
Series 2
−5
0 50 100 150 200

Series 3
4
−2
−4
0 50 100 150 200
Series 4
5
−5
−10
0 50 100 150 200 250
Series 5
6
−2
−4
0 50 100 150 200
Series 6
6
−2
−4
0 50 100 150 200
Figure 5.6: Standardized residuals Q̂−1/2 Êτ of the VAR(1) model for Rτ .
qq−plot
10
quantiles of the standardized residuals

4
−2
−4
−6
−8
−10
−10 −8 −6 −4 −2 0 2 4 6 8 10
quantiles of the symmetric alpha stable with alpha=1.8705
Figure 5.7: QQ-plot of the symmetric stable with α = 1.8705 versus the stan-
dardized residuals Q̂−1/2 Êτ .
estimate of the covariance is just
m τ
1 X
Σ̂ = Eτ Eτ′ . (5.5)
τm − 1 τ =1
Note that there is equal weight applied to each observation of the data set. To
allow a time-varying volatility estimate, the covariance estimate at time τ is al-
lowed to depend on the data before time τ , and the weights are assumed to decay
exponentially from the most recent observation:
Σ̂τ |τ −1 = (1 − λe ) Eτ −1 Eτ′ −1 + λe Eτ −2 Eτ′ −2 + λ2e Eτ −3 Eτ′ −3 + . . . ,

where 0 < λe < 1 and the weights are chosen so that they sum to one for an
infinite series. The estimate can also be written in the recursive form
Σ̂τ |τ −1 = (1 − λe )Eτ −1 Eτ′ −1 + λe Σ̂τ −1|τ −2 , (5.6)
which is known as the exponentially weighted moving average (EWMA) covariance

model with decay factor λe . In practice, an initial covariance Σ̂0|−1 is needed to
estimate λe , compute standardized residuals, and simulate sample paths. The
approach used is to estimate Σ0|−1 from the sample covariance (5.5) of the initial
10% of the data set.
RiskMetrics [21] offers an estimation technique for λe based on the root mean
squared prediction error (RMSE) of (eiτ )2 :
v
u τm
i
u 1 X 2
RM SE2 (λe ) = t (eiτ )2 − σ̂τ2|τ −1,ii (λe ) , (5.7)
τm τ =1
where σ̂τ2|τ −1,ii (λe ) is a diagonal component of Σ̂τ |τ −1 in equation (5.6). Since the
2
data series is assumed to have zero mean, Eτ −1 (eiτ ) = στ2|τ −1,ii , so the prediction
error of (eiτ )2 is the difference of terms inside the square root in equation (5.7).
A single optimal estimate λ∗e for the decay factor is computed from the RMSE of
each univariate series through the formulas:
n
X
λ∗e = φi λ∗i , (5.8)
i=1
where
RM SE2i (λ∗i ) θ−1

λ∗i = argmin RM SE2i (λ), θi = Pn k ∗
, φi = Pn i −1 .
λ k=1 RM SE2 (λk ) k=1 θk
(5.9)
Using this technique, RiskMetrics recommends typical parameter values of λe =
.94 for daily data and λe = .97 for monthly data.
Stable EWMA
These same ideas are used in the sub-Gaussian case in [19] by allowing a time-
varying dispersion matrix. Similar to as above, exponential weights are applied
to the moment estimators, yielding the equations:
p
q̂τp|τ −1,jj = (1 − λe ) ejτ −1 A(p) + λe q̂τp−1|τ −2,jj (5.10)
hq−1i
Bτ |τ −1,ij = (1 − λe )eiτ −1 ejτ −1 A(q) + λe Bτ −1|τ −2,ij (5.11)
q̂τ2|τ −1,ij = Bτ |τ −1,ij q̂τ2−q

|τ −1,jj , i 6= j, (5.12)
and the symmetric estimator for the time-varying dispersion matrix is given by

Q̂τ |τ −1 = q̂τ2|τ −1,ij + q̂τ2|τ −1,ji /2. This model is referred to as the stable expo-
nentially weighted moving average model (SEWMA). The authors also extend the
estimation technique for the decay factor by considering the prediction error of
p
|eiτ | . They note that Eτ −1 (|eiτ |p ) = qτp|τ −1,ii /A(p) and suggest to minimize the
following RMSE error for each univariate series:
v
u τm
u 1 X 2
i
RM SEp (λe ) = t A(p) |eiτ |p − q̂τp|τ −1,ii (λe ) . (5.13)
τm τ =1
The single optimal decay factor λ∗e is then found by replacing RM SE2i with
RM SEpi is equations (5.8-5.9). Using the VAR(1) residuals of the ALM data, this
technique is applied in both the normal and sub-Gaussian cases with p = α/3. A
grid for λ was constructed with increments of 0.001, and RM SEpi (λ) was mini-
mized over this grid. In both cases, a value of λe = 0.95 for equations (5.10-5.12)
is found to be appropriate. The exact values of λ∗e are found in table (5.5).
There are difficulties in implementing the SEWMA model for the ALM resid-
α p λ∗e
Normal 2 0.6667 0.9496
Stable 1.8705 0.6235 0.9494
Table 5.5: Comparison of the optimal decay factor λ∗e under the normal and stable
assumptions using the selection criterion based on RM SEpi .
uals: While the estimate Q̂τ |τ −1 is defined to be symmetric, there is no guarantee

that it is positive definite. In the case of the ALM residuals, the eigenvalues
are often negative and often very near zero. The negative eigenvalues are easily
dealt with by using an incomplete Cholesky decomposition when computing the
standardized residuals and generating sample paths. The eigenvalues very near
zero, on the other hand, will cause the standardized residuals to explode beyond
any reasonable value. The likely cause of this inadequate estimate of the disper-
sion matrix is the short length of the data series. For this reason, the scenarios
generated according to the SEWMA model were not inputted into the ALM op-
timization problem.
Stable Subordination EWMA
To overcome the difficulties of the SEWMA model, a more ad hoc approach is
taken by modeling the time-varying sub-Gaussian distribution in terms of a gov-
erning Gaussian distribution and the scale parameters of the individual univariate
series. First, one needs the following result: if
2σy2 πα 2/α

g∼ N (0, σg2 ), y ∼ Sα (σy , 0, 0) , s ∼ Sα/2 cos , 1, 0 ,
σg2 4
and s and g are independent, then
d √
y= sg.
See [32] and the reference therein. If the governing Gaussian distribution Gτ for

2
the multivariate data has a time-varying covariance matrix Στ |τ −1 = στ |τ −1,ij
and each univariate series is modeled with an αi -stable random variable with time-
varying scale parameter qτ |τ −1,i , the previous results suggest a way to model Eτ
with a time-varying sub-Gaussian-like distribution:
 p 
s1τ gτ1
..
 
d 
Eτ =  . , (5.14)

 
√ n n
sτ gτ
 
gτ1
 .. 
 
Gτ =  .  ∼ N 0, Στ |τ −1 , (5.15)
 
n
gτ
!
2qτ2|τ −1,i πα 2/αi
i
siτ ∼ Sαi /2 cos , 1, 0 . (5.16)
στ2|τ −1,ii 4
When generating a sample for Eτ , the samples of siτ , i = 1, ..., n, are taken from the
same random seed so that the above equations will be close to the sub-Gaussian
representation where the same subordinator multiplies each component of the
normal random vector. In the above equations, the covariance of the governing
Gaussian distribution captures the dependence between the series, and each sub-
ordinator siτ is chosen to give the proper tail index and scale parameter for each of
the univariate series. Recall that for the sub-Gaussian distribution, all marginals
have the same tail index, so the above equations are actually an extension that
allow different tail indexes, αi , for the marginals. The scale parameters and covari-
ance matrix are estimated from EWMA equations already seen. The time-varying
estimate for the scale parameter is given by:
pi
σ̂τp|τ
i j
−1,i = (1 − λe ) eτ −1 A(pi ) + λe σ̂τp−1|τ
i
−2,i , (5.17)
which similar to equation (5.10), and it is reasonable to take pi = αi /3 for financial

series. To obtain the estimate for the covariance of the governing Gaussian, the
data set of Eτ is first truncated at 5% and 95% to remove the effects of extreme
events. The estimate is then obtained from the truncated series Eτ∗ by:
′
Σ̂τ |τ −1 = (1 − λe )Eτ∗−1 Eτ∗−1 + λe Σ̂τ −1|τ −2 . (5.18)
The optimal value of λe is best calibrated through backtesting, or alternatively,

the RiskMetric technique for RM SEpi can be carried over. The latter approach
is used for the ALM data, which gives λe = 0.95 again. This model will be
referred to as the stable subordination exponentially weighted moving average
model (SSEWMA).
5.1.2 VaR Backtesting
The forecasting performances of the EWMA and SSEWMA models are examined
by comparing the predicted VaRs with the observed returns as in [31]. From the
definition of VaR, the null hypothesis to test is:
P (rτ < −VaRβ (τ )) = 1 − β, (5.19)
for a return series {rτ }. This hypothesis is tested for each ALM return series
ri = {rτi , τ = 1, ..., 237}, i = 1, ..., 6, and for various values of β.
In this backtesting analysis, both the VAR(1)-EWMA and VAR(1)-SSEWMA
models are fit to a moving window of 100 data points. Since it is difficult to
estimate the tail index of the stable distribution with such a short time-series,
it is assumed that αi = 1.8 for each of the univariate series in the SSEWMA
model. Let VaRβ (τ ), for τ = 101, ..., 237, be the estimate of VaRβ (τ ) from a
model calibrated to {rτ̃ , τ̃ = τ − 100, ..., τ − 1}. If equation (5.19) holds, then

 1 with probability 1 − β,
χτ = 1 rτ < −VaRβ (τ ) = (5.20)
 0 with probability β,
where 1(·) is the indicator function, and the total number of VaR exceedings has
a binomial distribution:
237
X
X = χτ ∼ Bin(137, 1 − β). (5.21)
τ =101
The testing rule is to reject the null hypothesis at level of significance 100δ% if
 
X
X 137
  (1 − β)k β 137−k ≤ δ/2, (5.22)
k=1 k
or,  
X
X 137
  (1 − β)k β 137−k ≥ 1 − δ/2. (5.23)
k=1 k
The number of exceedings and the corresponding p-values for each ALM return
series are contained in tables (5.6-5.7). The conclusions are:
• At level of significance 99%, neither the EWMA or the SSEWMA model is

rejected for any value of β.
• At level of significance 95%, the EWMA model is rejected three times for
β = 0.99 and once for β = 0.95 while the SSEWMA model is never rejected.
Exceedings and p-values
β r1 r2 r3 r4 r5 r6
0.99 4 (0.0252) 3 (0.0990) 4 (0.0252) 4 (0.0252) 3 (0.0990) 3 (0.0990)
0.95 12 (0.0405) 11 (0.0850) 9 (0.2984) 8 (0.4955) 10 (0.1657) 6 (0.9379)
0.90 16 (0.4168) 13 (0.9851) 14 (0.7920) 16 (0.4168) 14 (0.7920) 12 (0.7586)
0.80 26 (0.8639) 28 (0.7987) 26 (0.8639) 32 (0.2773) 28 (0.7987) 26 (0.8639)
Table 5.6: Number of VaRβ exceedings in 137 data points with corresponding
p-values under the normal assumption.
Exceedings and p-values

β r1 r2 r3 r4 r5 r6
0.99 1 (0.7968) 2 (0.3171) 2 (0.3171) 3 (0.0990) 2 (0.3171) 3 (0.0990)
0.95 11 (0.0850) 11 (0.0850) 9 (0.2984) 9 (0.2984) 10 (0.1657) 8 (0.4955)
0.90 16 (0.4168) 14 (0.7920) 17 (0.2808) 18 (0.1800) 18 (0.1800) 16 (0.4168)
0.80 28 (0.7987) 31 (0.3783) 27 (0.9659) 32 (0.2773) 29 (0.6417) 27 (0.9659)
Table 5.7: Number of VaRβ exceedings in 137 data points with corresponding
p-values under the stable assumption (α = 1.8).
This indicates that the normal distribution is overly optimistic in predicting

the occurrence of the largest losses, and the stable distribution results in a
more reliable forecast.
• For β = 0.90 and 0.80, the EWMA model produces reasonably large p-
values, which just indicates that the normal distribution could be suitable
for forecasting more toward the middle of the distribution.
Overall, the SSEWMA model provides a better fit to the tails and is preferable
based on the examination of the p-values.
5.2 Solving the Optimization Problem
The ALM optimization problem is now solved using scenarios generated from the
time-series models of the previous section. First, efficient frontiers are developed
from the 2-stage problem with scenarios based on the EWMA and SSEWMA
models, and postoptimality analysis is briefly discussed. Then, backtesting is
carried out to compare the performance of the 1-stage problem versus the 2-
stage recourse problem and the normal assumption versus the stable assumption.
The results from varying the distributional assumption are mixed, but the 2-
stage recourse problem outperforms the 1-stage problem. Before presenting these
results, the parameters of the optimization problem are first specified.
For pension funds, decisions are made approximately on an annual basis, so a
stage in the stochastic program should correspond to 12 months. A twelve month
stage left too few data points in the backtesting, so the decision was made to
shorten the stage to cover a six month period. In addition to giving more points
for comparison in the backtesting, the time-series models should generate more
reliable scenarios over the shorter time period.
For the 2-stage problem, a balanced scenario tree is generated with 104 first
stage scenarios and 107 second stage scenarios, giving 103 second stage nodes con-
nected to each first stage node. This huge number of scenarios gives fairly reliable
optimal allocations, and memory limitations did not allow much larger scenario
trees to be considered. The first stage scenarios were created by simulating 104
sample paths of the time-series model out to six months, and the second stage
scenarios were created by simulating another 103 sample paths out an additional
six months for each of first stage scenarios. Scenario reduction and bundling using
the methods of probability metrics was also attempted in order to created a better
set of first stage scenarios, but these methods could not handle sample paths of
this number with the given hardware.
It is necessary to convert the generated sample paths of the returns back
to the index values of the benchmarks. This is not a problem when using the
normal distribution, but it does cause some small difficulties when using the stable
distribution. Since the returns have infinite variance under the stable assumption
and are temporally dependent, the sample paths of the corresponding index values
will explode. For this reason, the stable return scenarios are truncated at levels
corresponding to p-values of 0.001 and 0.999 of the estimated distribution. This
eliminates the explosion of the index values while still fitting the tail of the return
distribution better than the normal assumption.
For the efficient frontiers and at the start date of the backtesting, it is as-
sumed that the pension fund is fully funded: the total asset wealth and the lia-
bility obligation are both taken to be $1,000, and because of the structure of the
deterministic equivalent form of the optimization problem, any pension fund that
is fully funded will have the same optimal allocations (as a percent of the asset
wealth). For instance, a fund with an initial $1,000,000 in both asset wealth and
liability obligation has the same optimal allocations as one with $1,000 in both.
Including transaction costs, the optimal allocations depend also on the initial
allocation, not just the generated scenarios and initial wealth. In this case, it is
assumed that the fund initially holds 40% of its wealth in bonds and 60% of its
wealth in equities. A reasonable assumption for the trading costs, as a percent of
wealth traded, is obtained from data on mutual funds in [9]. In our example, the
median trading cost (TC) is 0.70% of fund assets per year:
TC ≈ 0.0070 · Fund Assets.
The turnover, defined as the ratio of annual fund sales to the fund assets, is
determined to have a median of 0.70:
Fund Sales ≈ 0.70 · Fund Assets.

Assuming that the fund buys approximately as much as it sells, then
Traded Wealth ≈ 2 · Fund Sales.
Combining equations yields
0.0070
TC ≈ Traded Wealth,
2 · 0.70
or trading costs are approximately 0.5% of the traded wealth. Additionally as-
suming that the transaction costs are the same for each of the five ALM asset
classes, the values of T CB i = T CS i = 0.005, for i = 1, ..., 5, are used in the
optimization problem.
5.3 Efficient Frontiers
The numerical results of the efficient frontiers for the 2-stage recourse problem
are now given. Recall that the risk measure for the 2-stage problem is:
ρ2 = µ1 CVaRβ (−sw2 ) + µ2 CVaRβ (−sw3 ), (5.24)
where swt+1 is the surplus wealth at the end of stage t (and sw1 = 0 since the
pension fund is initially fully funded). A confidence level of β = 0.95 is used in this
section to emphasize the differences between the normal and stable assumptions.
For the remainder, it is taken that µ1 = µ2 = .5, and studies in assigning different
weights to the CVaR at different stages is saved for a later time. Since the reward
is the expected surplus wealth at the end of the second stage, E(sw3 ), the efficient
frontier is obtained by varying λ in the minimization objective: λρ2 −(1−λ)E(sw3 ).
Figure (5.8) contains three different efficient frontiers:
80
60
40
20
reward
Normal, no Trans. Costs

Normal, Trans. Costs
0 Stable, no Trans. Costs
Normal, Wealth Optimization
−20
−40
−60
240 260 280 300 320 340 360 380 400 420
risk
Figure 5.8: Efficient frontiers under the normal assumption and stable assumption
for β = 0.95.
• Optimization without transaction costs and scenarios generated from the

normal assumption (EWMA model).
• Optimization with transaction costs and the same set of scenarios generated
from the normal assumption.
• Optimization without transaction costs and scenarios generated from the

stable assumption (SSEWMA model) using the tail index estimates from
table (5.2).
The optimal allocations, as percents of the initial wealth, can be found in the
tables (6.1-6.3) in the appendix of the lecture notes.
In all three cases and for any value of λ, the optimal first stage allocations
are some combination of the bond and international equity indexes. The portfolio
that maximizes the expected final surplus wealth (λ = 0) invests entirely in the
international equity index, and the minimum risk portfolio (λ = 1) invests entirely
in the bond index. A couple immediate comments can be made about the figure.
Since the stable distribution has a higher probability of extreme events, the frontier
of the stable distribution lies below that of the normal distribution. The inclusion
of transaction costs also moves the efficient frontier downward, and the distance
it moves for various values of λ depends on the initial allocation.
A few risk-reward points obtained by replacing the surplus wealth with the
wealth in the optimization problem, under the normal assumption, are also in-
cluded in figure (5.8). The optimal allocations, found in table (6.4), are very
different in this case: The minimum risk portfolio has a very large proportion
of wealth invested in the cash index. When the corresponding ρ2 and E(sw3 )
are calculated, the points for the risk-averse portfolios lie far below the efficient
frontier. This illustrates the advantage of considering the liabilities and assets
together in the same optimization problem. Maximizing the expected final wealth
and maximizing the expected final surplus wealth result in the same values of ρ2
and E(sw3 ) because of the linearity of the problem.
5.4 Postoptimality Analysis and Backtesting
5.4.1 Postoptimality Analysis
The basic postoptimality analysis examines how the optimal value of a stochastic
program changes as the initial probability distribution P1 becomes contaminated
with another probability distribution P2 . Usually problems of the following form
are considered:
φ(P1 ) = min F (x1 , P1 ) (5.25)
x1 ∈X
where P1 is a discrete probability distribution of scenarios, X does not depend on

P1 , x1 are the scenario independent first stage decision variables, and F is convex
in x1 and linear in P1 . The original probability distribution is assumed to become
contaminated through
Pψ = (1 − ψ)P1 + ψP2 , with 0 < ψ < 1. (5.26)
This means that the scenarios of both distributions are aggregated into one set of
scenarios where the probabilities of the scenarios in P1 are weighted by 1 − ψ and
the probabilities of the scenarios in P2 are weighed by ψ. If the optimal solution
is denoted by
x1 (P1 ) = arg min F (x1 , P1 ), (5.27)
x1 ∈X
a set of bounds for the optimal value of the stochastic program under the conta-
minated distribution, φ(Pψ ), are given by
(1 − ψ)φ(P1 ) + ψφ(P2 ) ≤ φ(Pψ )

≤ min {(1 − ψ)φ(P1 ) + ψF (x1 (P1 ), P2 ), (1 − ψ)F (x(P2 ), P1 ) + ψφ(P2 )} ,
(5.28)
where F (x1 (P1 ), P2 ) is the value of the objective under distribution P2 when the
first stage decision is x1 (P1 ) (there is still an implicit minimization over the second
stage variables). F (x1 (P2 ), P1 ) is found in a similar manner.
It is not difficult to verify that the ALM problem can be written in the above
form. This contamination method can be easily applied to the situation where P1
corresponds to the set of scenarios generated from the normal assumption and P2
corresponds to the set of scenarios generated from the stable assumption. In the
case of the minimum risk portfolio (λ = 1), the optimal objective value coincides
with the minimum risk value. Let
ρn2 = φ(P1 ), and ρs2 = φ(P2 ), (5.29)
correspond to the 2-stage risk under the normal and stable distributions, respec-
tively. Also, denote the risk under distribution Pψ by ρψ2 . As seen in tables (6.1-
6.2), the optimal allocations under both the normal assumption and the stable as-
sumption invest all the wealth in the bond index. It follows that F (x(P2 ), P1 ) = ρn2
and F (x(P1 ), P2 ) = ρs2 , and the bounds in equation (5.28) produce
ρψ2 = (1 − ψ)ρn2 + ψρs2 ,
= (1 − ψ) · 246.13 + ψ · 291.21.
The minimum risk in the 2-stage program is then easily calculated when scenarios
under the normal assumption and stable assumption are combined. The gen-
eral contamination technique can also be applied for any value of λ, but direct
information about the risk can no longer by calculated.
5.4.2 Portfolio Backtesting
Finally, some backtesting results will be presented. The first round includes trans-
action costs, and the initial conditions for each run of the optimization problem
come from the previous period considered. This provides a realistic comparison
for the 1-stage problem versus the 2-stage problem, but it is difficult to calculate
the realized risk using the risk measure that was optimized. In the second round,
the transaction costs are removed and the initial conditions are reset every run of
the optimization problem. This allows the realized risk to be directly calculated
in terms of the optimized risk measure and provides a better comparison for the
distributional assumptions; however, this setup favors the 1-stage problem over
the 2-stage problem because the second stage becomes irrelevant.
Dynamic Backtesting: 1-stage versus 2-stage
This section performs the dynamic backtesting of the minimum risk 1-stage
and 2-stage portfolios with transaction costs. The 2-stage problem finds the op-
timal allocations that minimize ρ2 , and the 1-stage problem finds that optimal
allocation that minimizes
ρ1 = CVaRβ (−sw2 ). (5.30)
For a given distributional assumption, the same sets of scenarios are used when
solving the 1-stage and 2-stage problems: The 1-stage problem is just restricted
to considering the 104 first stage scenarios.
The time-series models are fit to a moving window of 100 data points under
both the normal and stable assumptions using the EWMA and SSEWMA models,
respectively. Running the optimization problems with scenarios generated from
the time-series models fit to the first 100 monthly data points give optimal allo-
cations for the six month period beginning in July, 1993. It is again assumed that
the pension fund is initially fully funded with 40% of wealth in the bond index
and 60% of wealth in the equity index. The window is then shifted forward by
6 data points, and the optimization problems output optimal allocations for Jan-
uary, 1994. The asset wealths resulting from the previous allocations, and those
allocations themselves, are used as the initial conditions for the new optimization
problems. This setup means that the 2-stage problem is run on a rolling horizon:
Since new scenarios are generated every 6 months, only the first stage allocations
are actually implemented.
Since it is difficult to obtain a good estimate for the tail index of a stable
distribution with only 100 data points, it is assumed that α = 1.8 in the SSEWMA
model. The backtesting, therefore, gives a comparison of the normal assumption
with the stable assumption for this particular value of the tail index.
The window is shifted 21 times resulting in a final surplus wealth for July,
2004. Since this results in only 22 values of the surplus wealth for comparison,
the confidence level of CVaR was reduced to β = 0.80 in ρ1 and ρ2 . To measure
the relative performances, it is necessary to calculate the risk of the realized
surplus wealths. However, it is not reasonable to directly calculate the CVaR of
these values because the surplus wealth that is used as the initial condition in
the optimization problems varies over the time horizon and is different for the
different assumptions. It is also not possible to calculate the CVaR of the return
of the surplus wealth because the surplus wealth is not strictly positive. By the
translation invariance property of a coherent risk measure, it is more reasonable
to look at the change in surplus wealth:
CVaRβ (−sw2 ) = sw1 + CVaRβ (−∆sw),
since sw1 is a fixed initial condition. Therefore, minimizing the CVaR in the
next time period has the effect of minimizing the CVaR of the change, but one
cannot still make a direct comparison because the asset wealth also varies for the
different assumptions over the horizon. The measure of realized risk, ρ̃, used in
the comparison is the CVaR at 80% confidence level of the change in negative
surplus wealth per dollar of asset wealth from the previous period. One can
expect that minimizing ρ1 and ρ2 produces small values of ρ̃, but ρ̃ does not
give a perfect comparison of risk because the resulting optimal allocations depend
on the ratios of assets to liabilities, not just the asset wealths. Values of ρ̃ and
ρ̃ final sw
1-stage Normal 0.0466 1177.29
1-stage Stable 0.0509 1077.64
2-stage Normal 0.0456 1209.22
2-stage Stable 0.0491 1217.92
Fixed-Mixed 0/40/60/0/0 0.0924 241.04
Fixed-Mixed 0/100/0/0/0 0.0776 -371.39
Table 5.8: Dynamic backtesting results.
the final surplus wealth are found in table (5.8). For comparison, this table also
includes values for the fixed-mixed rule of 40% bonds and 60% equity, and the
rule of 100% in bonds. Under both the normal and stable assumptions, the 2-
stage recourse problem outperforms the 1-stage problem by both reducing ρ̃ and
increasing the final surplus wealth. While the 2-stage problem under the stable
assumption results in the highest final surplus wealth, the normal assumption gave
lower values of ρ̃. The fixed-mixed rules were no comparison with the stochastic
programs.
Figures (5.9-5.11) show the evolution of the asset wealths and liability value
over the time horizon. One can see that minimizing CVaR does not look like a
typical index tracking problem because the upside is not penalized. The asset
wealths and the liability values are in table (6.5), and the optimal allocations can
be found in the appendix. These tables also include the percent of asset wealth
loss to transaction costs.
An additional comparison of the performance of the stable and normal dis-
tributions can be obtained by VaR backtesing similar to section 5.1.2. Future
material on this issue will be provided in the lecture.
3500
2−stage asset wealth
Liability value
3000
2500
dollars
2000
1500
1000
1994 1995 1997 1998 2000 2001 2002 2004
Figure 5.9: Dynamic backtesting: 1-stage versus 2-stage under the normal as-
sumption.
3500
Liability value
3000
2500
dollars
2000
1500
1000
1994 1995 1997 1998 2000 2001 2002 2004
Figure 5.10: Dynamic backtesting: 1-stage versus 2-stage under the stable as-
sumption.
2800
0/40/60/0/0 asset wealth
2600
0/100/0/0/0 asset wealth
Liability value
2400
2200
2000
dollars
1800
1600
1400
1200
1000
800
1994 1995 1997 1998 2000 2001 2002 2004
Figure 5.11: Dynamic backtesting: Fixed-mixed rules.

Chapter 6
Appendix - Tables for Empirical

Analysis
94
Optimal First Stage Allocations
λ E(−sw3 ) ρ2 CVaR1 CVaR2 Cash Bonds Equities Int. Eq. Mortgages
0 72.18 399.46 319.44 479.48 0 0 0 100 0
0.10 71.68 389.07 319.44 458.70 0 0 0 100 0
0.20 70.82 384.04 319.44 448.63 0 0 0 100 0
0.25 64.49 364.44 299.09 429.80 0 10.6483 0 89.3517 0
0.30 42.92 306.50 232.28 380.71 0 48.3706 0 51.6294 0
0.35 32.63 284.80 207.23 362.36 0 64.9963 0 35.0037 0
0.40 25.61 273.09 194.71 351.47 0 75.1793 0 24.8207 0
0.45 19.75 265.12 186.68 343.57 0 82.9291 0 17.0709 0
0.50 15.08 259.96 182.16 337.76 0 88.1928 0 11.8072 0
0.60 6.72 253.12 177.31 328.92 0 95.7591 0 4.2409 0
0.75 -2.74 248.36 175.47 321.24 0 100 0 0 0
1.00 -20.23 246.13 175.47 316.79 0 100 0 0 0
Table 6.1: Efficient frontier under the normal assumption with β = 0.95.
0 70.47 409.60 321.91 497.29 0 0 0 100 0
0.10 70.08 401.18 321.91 480.45 0 0 0 100 0
0.20 69.36 396.97 321.91 472.03 0 0 0 100 0
0.25 68.92 395.45 321.91 468.99 0 0 0 100 0
0.30 57.09 365.74 288.65 442.84 0 20.8749 0 79.1251 0
0.35 43.60 337.37 256.10 418.64 0 44.3374 0 55.6626 0
0.40 34.79 322.43 239.12 405.73 0 58.8847 0 41.1153 0
0.45 27.67 312.73 228.49 396.97 0 69.8927 0 30.1073 0
0.50 21.83 306.25 221.75 390.75 0 78.1959 0 21.8041 0
0.60 13.85 299.59 216.06 383.12 0 87.1561 0 12.8439 0
0.75 2.97 294.23 212.78 375.68 0 95.7426 0 4.2574 0
1.00 -19.85 291.21 212.08 370.34 0 100 0 0 0
Table 6.2: Efficient frontier under the stable assumption with β = 0.95.
Optimal First Stage Allocations Trans.
λ E(−sw3 ) ρ2 CVaR1 CVaR2 Cash Bonds Equities Int. Eq. Mortgages Costs
0 59.28 415.61 328.26 502.96 0 0 0 99.0050 0 0.9950
0.10 58.80 405.57 328.26 482.89 0 0 0 99.0050 0 0.9950
0.20 57.83 399.83 328.26 471.41 0 0 0 99.0050 0 0.9950
0.25 37.16 327.52 251.65 403.39 0 40.0000 0 59.4030 0 0.5970
0.30 34.81 321.78 247.18 396.37 0 42.6534 0 56.7496 0 0.5970
0.35 21.80 294.29 217.23 371.35 0 61.8604 0 37.5426 0 0.5970
0.40 13.93 281.12 203.63 358.61 0 72.4813 0 26.9217 0 0.5970
0.45 7.14 271.91 194.45 349.36 0 81.0294 0 18.3736 0 0.5970
0.50 2.07 266.31 189.48 343.14 0 86.5774 0 12.8256 0 0.5970
0.60 -6.25 259.47 184.36 334.58 0 94.1698 0 5.2331 0 0.5971
0.75 -15.27 254.86 181.98 327.75 0 99.4030 0 0 0 0.5970
1.00 -31.96 253.21 181.98 324.45 0 99.4030 0 0 0 0.5970
Table 6.3: Efficient frontier under the normal assumption with transaction costs and β = 0.95.

0 72.18 399.46 319.44 479.48 0 0 0 100 0
0.25 56.24 349.48 271.26 427.70 0 25.7090 0 74.2910 0
0.50 8.00 293.29 214.21 372.38 0 0 0 20.8588 79.1412
0.75 -40.48 333.07 247.64 418.49 74.2995 0 0 3.9844 21.7161
1.00 -53.29 353.66 253.02 454.30 84.9059 1.0005 0 2.4420 11.6516
Table 6.4: Wealth optimization under the normal assumption with no transaction costs and β = 0.95.
Asset Wealth
Date Liability 1-stage 2-stage Fixed-mixed
Value Normal Stable Normal Stable 0/40/60/0/0 0/100/0/0/0
7/93 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00 1000.00
1/94 1067.19 1036.08 1033.60 1034.93 1032.71 1067.69 1034.72
7/94 936.20 1006.58 1003.17 1005.07 1002.12 1031.23 1000.94
1/95 932.01 1015.83 1009.89 1014.77 1009.76 1061.09 1010.78
7/95 1085.82 1119.68 1106.91 1118.45 1108.27 1233.43 1102.13
1/96 1267.52 1230.57 1213.50 1235.10 1219.35 1376.50 1182.06
7/96 1137.78 1245.16 1227.06 1250.82 1234.14 1382.09 1163.18
1/97 1208.93 1545.82 1486.14 1552.85 1532.14 1609.50 1220.62
7/97 1354.74 1894.36 1818.87 1902.97 1877.59 1862.24 1288.37
1/98 1503.39 1961.80 1883.63 1970.72 1944.43 1937.84 1351.51
7/98 1566.35 2259.70 2169.65 2269.97 2239.69 2136.26 1389.74
1/99 1726.53 2599.19 2495.61 2611.01 2576.17 2371.80 1460.65
7/99 1536.38 2531.37 2448.79 2555.91 2543.31 2411.78 1424.34
1/00 1528.22 2568.31 2490.92 2610.18 2603.17 2498.60 1433.68
7/00 1715.38 2694.08 2612.77 2729.57 2724.48 2599.06 1509.31
1/01 1864.70 2884.14 2803.19 2910.55 2916.84 2621.32 1631.85
7/01 1915.56 3003.98 2920.14 3030.85 3038.17 2495.37 1700.87
1/02 1960.23 3100.08 3013.55 3127.81 3135.36 2436.33 1755.28
7/02 2068.61 3230.29 3140.12 3259.18 3267.05 2202.77 1829.00
1/03 2291.06 3393.44 3298.72 3423.79 3432.06 2176.37 1921.38
7/03 2180.78 3405.28 3310.23 3435.73 3444.03 2398.13 1928.08
1/04 2400.67 3558.11 3458.80 3589.93 3598.60 2659.32 2014.61
7/04 2392.77 3570.05 3470.40 3601.98 3610.68 2633.80 2021.38
Table 6.5: Dynamic backtesting: Realized liability value and asset wealths for the
optimal allocations with β = 0.80.
Date Cash Bonds Equities Intern. Mortgages Transaction
Equities Costs
(initial) 0 40 60 0 0
7/93 0 89.9043 4.9691 4.5791 0 0.5476
1/94 0 89.7865 0.1678 9.9954 0 0.0503
7/94 0 85.6719 9.9092 4.3211 0 0.0979
1/95 0 89.7313 10.2284 0 0 0.0403
7/95 0 58.9767 40.7269 0 0 0.2964
1/96 0 0 99.4273 0 0 0.5727
7/96 0 0 100 0 0 0
1/97 0 0 100 0 0 0
7/97 0 0 100 0 0 0
1/98 0 0 100 0 0 0
7/98 0 0 100 0 0 0
1/99 0 88.4698 10.6410 0 0 0.8891
7/99 0 82.3947 17.5437 0 0 0.0616
1/00 0 81.7418 18.2582 0 0 0
7/00 0 91.9907 7.9093 0 0 0.1000
1/01 0 99.9294 0 0 0 0.0706
7/01 0 100 0 0 0 0
1/02 0 100 0 0 0 0
7/02 0 100 0 0 0 0
1/03 0 100 0 0 0 0
7/03 0 100 0 0 0 0
1/04 0 100 0 0 0 0
Table 6.6: Dynamic backtesting: Allocations (as a percent of asset wealth) for the
1-stage optimization problem under the normal assumption with β = 0.80.
Date Cash Bonds Equities Intern. Mortgages Transaction
Equities Costs
(initial) 0 40 60 0 0
7/93 0 91.4944 3.7318 4.2140 0 0.5599
1/94 0 90.9029 0 9.0523 0 0.0448
7/94 0 86.5537 9.8023 3.5455 0 0.0985
1/95 0 89.8535 10.1135 0 0 0.0330
7/95 0 50.4015 49.2155 0 0 0.3830
1/96 0 0 99.5129 0 0 0.4871
7/96 0 0 100 0 0 0
1/97 0 0 100 0 0 0
7/97 0 0 100 0 0 0
1/98 0 0 100 0 0 0
7/98 0 0 100 0 0 0
1/99 0 82.2585 16.9148 0 0 0.8267
7/99 0 67.1027 32.7497 0 0 0.1477
1/00 0 66.1382 33.8618 0 0 0
7/00 0 89.5188 10.2507 0 0 0.2305
1/01 0 99.9081 0 0 0 0.0919
7/01 0 100 0 0 0 0
1/02 0 100 0 0 0 0
7/02 0 100 0 0 0 0
1/03 0 100 0 0 0 0
7/03 0 100 0 0 0 0
1/04 0 100 0 0 0 0
Table 6.7: Dynamic backtesting: First stage allocations (as a percent of asset
wealth) for the 2-stage optimization problem under the normal assumption with
β = 0.80.
1-stage 2-stage
Date Normal Stable Normal Stable
CVaR CVaR CVaR1 CVaR2 CVaR1 CVaR2
7/93 139.97 118.33 139.99 248.58 118.35 211.53
1/94 165.24 165.42 166.51 286.72 166.45 285.05
7/94 13.25 24.93 14.91 77.33 26.09 95.20
1/95 -21.85 -2.31 -20.81 25.29 -2.22 52.64
7/95 90.97 112.84 92.62 202.52 111.41 229.59
1/96 165.43 199.19 159.34 217.74 192.03 262.59
7/96 17.94 53.73 11.81 89.41 46.62 139.86
1/97 -294.58 -201.09 -302.18 -335.87 -253.20 -259.12
7/97 -530.24 -415.32 -539.84 -622.72 -481.01 -531.86
1/98 -343.84 -223.30 -353.58 -327.22 -290.62 -217.16
7/98 -603.38 -471.40 -614.36 -624.62 -545.81 -522.57
1/99 -682.94 -569.73 -693.82 -572.10 -652.59 -544.71
7/99 -914.55 -825.38 -937.53 -938.14 -921.85 -929.62
1/00 -971.35 -876.49 -1009.61 -1015.10 -987.07 -971.63
7/00 -839.05 -731.40 -871.71 -770.64 -847.11 -725.54
1/01 -911.81 -782.59 -939.38 -834.34 -904.92 -756.63
7/01 -986.08 -850.41 -1014.32 -934.61 -975.67 -848.00
1/02 -1053.78 -948.54 -1082.97 -1014.99 -1077.36 -991.05
7/02 -1032.56 -886.39 -1063.62 -965.99 -1023.88 -879.87
1/03 -899.44 -761.41 -931.94 -768.19 -904.66 -706.25
7/03 -1013.32 -881.03 -1046.02 -906.55 -1024.45 -858.39
1/04 -933.25 -832.85 -967.31 -802.10 -982.10 -820.57
Table 6.8: Dynamic backtesting: Optimal values of CVaR0.80 for scenarios gener-
ated under the normal and stable assumptions.
Bibliography
[1] Carlo Acerbi and Dirk Tasche. Expected shortfall: A natural coherent alter-
native to value at risk. http://www.gloriamundi.org, May 2001.
[2] Carlo Acerbi and Dirk Tasche. On the coherence of expected shortfall. Jour-
nal of Banking and Finance, 26(7):1487–1503, July 2002.
[3] Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, and David Heath. Co-
herent measures of risk. Mathematical Finance, 9(3):203–228, July 1999.
[4] John R. Birge and François Louveaux. Introduction to Stochastic Program-

ming. Springer Series in Operations Research. Springer, New York, 1997.
[5] Brendan O. Bradley and Murad S. Taqqu. Financial risk and heavy tails. In
Svetlozar T. Rachev, editor, Handbook of heavy tailed distributions in finance,
Handbooks In Finance. Elsevier Science, The Netherlands, 2003.
[6] John Y. Campbell, Andrew W. Lo, and A. Craig MacKinlay. The Econo-
metrics of Financial Markets. Princeton University Press, Princeton, New
Jersey, 1997.
[7] David R. Carino, David H. Myers, and William T. Ziemba. Concepts, tech-
nical issues, and uses of the russell-yasuda kasai financial planning model.
Operations Research, 46(4):450–462, July-August 1998.
102
[8] Yair Censor and Stavros A. Zenios. Parallel Optimization: Theory, Algo-
rithms, and Applications. Numerical Mathematics and Scientific Computa-
tion. Oxford University Press, New York, 1997.
[9] John M.R. Chalmers, Roger M. Edelen, and Gregory B. Kadlec. Transaction-
cost expenditures and the relative performance of mutual funds. Technical
report, The Wharton School, University of Pennsylvania, November 1999.
[10] M.A.H. Dempster and R.T. Thompson. Evpi-based importance sampling

solution procedures for multistage stochastic linear programmes on parallel
mimd architectures. Annals of Operations Research, 90:161–184, 1999.
[11] J. Dupačová, N. Gröwe-Kuska, and W. Römisch. Scenario reduction in sto-

chastic programming stochastic programming: An approach using probability
metrics. Mathematical Programming, 94:493–511, 2003.
[12] Jitka Dupačová, Giorgio Consigli, and Stein W. Wallace. Scenarios for multi-
stage stochastic programs. Annals of Operations Research, 100:25–53, 2000.
[13] Jitka Dupačová, Jan Hurt, and Josef Štěpán. Stochastic Modeling in Eco-
nomics and Finance. Kluwer Academic Publishers, Netherlands, 2002.
[14] A. Eichhorn and W. Rmisch. Polyhedral risk measures in stochastic pro-

gramming. SIAM Journal on Optimization, 16:69–95, 2005.
[15] P. Embrechts, A. McNeil, and D. Straumann. Correlation and Dependence

in Risk Management: Properties and Pitfalls. 1999. In: Risk management:
value at risk and beyond, ed. Dempster, M.
[16] Nicole Gröwe-Kuska, Holger Heitsch, and Werner Römisch. Scenario reduc-
tion and scenario tree construction for power management problems. Tech-
nical report, Institute of Mathematics, Humboldt-University Berlin.
[17] Holger Heitsch and Werner Römisch. Generation of multivariate scenario
trees to model stochasticity in power management. Technical report, Institute
of Mathematics, Humboldt-University Berlin.
[18] Pavlo Krokhmal, Jonas Palmquist, and Stanislav Uryasev. Portfolio opti-
mization with conditional value-at-risk objective and constraints. The Jour-
nal of Risk, 4(2), 2002.
[19] Fabio Lamantia, Sergio Ortobelli, and Svetlozar Rachev. Value at risk with
stable distributed returns. Available at http://www.pstat.ucsb.edu/research.
[20] C. D. Maranas, I. P. Androulakis, C. A. Floudas, A. J. Berger, and J. M. Mul-

vey. Solving long-term financial planning problems via global optimization.
Journal of Economic Dynamics and Control, 21:1405–1425, 1997.
[21] J.P. Morgan. Riskmetrics - Technical Document. Technical report, December

1996.
[22] John Mulvey and Hercules Vladimirou. Stochastic optimization models for
investment planning. Annals of Operations Research, 20:187–217, 1989.
[23] John Mulvey and Hercules Vladimirou. Applying the progressive hedging
algorithm to stochastic generalized networks. Annals of Operations Research,
31:399–424, 1991.
[24] John Mulvey and Hercules Vladimirou. Solving multistage stochastic net-
works: An application of scenario aggregation. Networks, 21(6):619–643,
1991.
[25] John Mulvey and Hercules Vladimirou. Stochastic network programming for
financial planning problems. Management Science, 38(11):1642–1664, Nov.
1992.
[26] Soren S. Nielsen and Stravos A. Zenios. Solving multistage stochastic net-
work programs on massively parallel computers. Mathematical Programming,
73:227–250, 1996.
[27] Sergio Ortobelli, Isabella Huber, Svetlozar Rachev, and Eduardo Schwartz.
Portfolio choice theory with non-gaussian distributed returns.
[28] G. Ch. Pflug. Scenario tree generation for multiperiod financial optimization
by optimal discretization. Mathematical Programming, 89:251–271, 2001.
[29] Georg Pflug. Some remarks on the value-at-risk and the conditional value-
at-risk. In Stanislav Uryasev, editor, Probabilistic Constrained Optimization:
Methodology and Applications. Kluwer Academic Publishers, 2002.
[30] Svetlozar Rachev and Stefan Mittnik. Stable Paretian Models in Finance.
Series in Financial Economics and Quantitative Analysis. John Wiley & Sons
Ltd., 2000.
[31] Svetlozar Rachev, Eduardo Schwartz, and Irina Khindanova. Sta-

ble modeling of market and credit value at risk. Technical report,
University of California, Santa Barbara, October 2001. Available at
http://www.pstat.ucsb.edu/research/papers.
[32] Svetlozar T. Rachev, Eduardo S. Schwartz, and Irina Khindanova. Stable

modeling of market and credit value at risk. In Svetlozar T. Rachev, editor,
Handbook of heavy tailed distributions in finance, Handbooks In Finance.
Elsevier Science, The Netherlands, 2003.
[33] R. Tyrrell Rockafellar and Stanislav Uryasev. Conditional value-at-risk for

general loss distributions. Journal of Banking and Finance, 26(7):1443–1471,
2002.
[34] Ronald J. Ryan and Frank J. Fabozzi. The pension crisis revealed. The
Journal of Investing, Fall, 2005.
[35] B Schweizer and A. Sklar. Probabilistic Metric Spaces. North Holland Else-
vier, New York, 1983.
[36] Nikolas Topaloglou, Hercules Vladimirou, and Stavros A. Zenios. Cvar models
with selective hedging for international asset allocation. Journal of Banking
and Finance, 26:1535–1561, 2002.
[37] Stavros A. Zenios. Lecture notes: Mathe-

matical modeling and its application in finance.
http://www.hermes.ucy.ac.cy/zenios/teaching/399.001/index.html.
[38] Eric Zivot and Jianhui Wang. Modeling Financial Time Series with S-Plus.
Springer, New York, 2003.

Handout_ALM

Uploaded by

Copyright:

Available Formats

Handout_ALM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handout_ALM

Uploaded by

Copyright:

Available Formats

HECTOR SCHOOL OF ENGINEERING

UNIVERSITY FRIDERICIANA KARLSRUHE

Insurance, Risk Analysis and Asset Liability Management

Asset Liability Management

2) Stavros A. Zenios, Lecture Notes: Mathematical modeling and its applica-

3) F. Fabozzi and A. Konisbi, The Handbook of Asset/Liability Management:

4) Svetlozar Rachev and Stefan Mittnik, Stable Paretian Models in Finance,

2 Risk and Optimization 6

3 Portfolio Optimization and Stochastic Programming 18

4 Modeling of the Risk Factors 35

6 Appendix - Tables for Empirical Analysis 94

100M io · 1.032 = 103.2M io

The book value of the financing (the bank’s liability) is:

100M io · 1.0325 1.0604 = 92.72M io

• Chapter three provides information on single- and multistage optimization

• Chapter four discusses the choice of an adequate distribution for modeling

• Chapter five provides an example of an empirical application of ALM tech-

Risk and Optimization

The goal of risk-return optimization is to optimize a tradeoff between the risk

2.1 Risk Measures

The standard measure of risk for a portfolio of equities suggested by Markowitz

However, numerical optimization of semivariance is difficult. Another modification

A second criticism of variance is that financial returns are typically heavy-tailed,

or alternatively, the scale parameter of a stable distribution can be used in place

For a given decision x, the Value at Risk at confidence level β is given by

VaRβ (x) = inf {ζ|ΨL (x, ζ) ≥ β} .

CVaRβ (x) = E (Tβ (x)) .

As is implied by its name, CVaR is closely related to the conditional expectation

If there is no discontinuity in the distribution function of L(x) at VaRβ (x), then

i. sub-additive: ρ(v + v ′ ) ≤ ρ(v) + ρ(v ′ ),

ii. positive homogeneous: ρ(λv) = λρ(v), ∀λ ≥ 0,

iii. translation invariant: ρ(v + c) = ρ(v) + c, ∀c ∈ R, and

iv. monotonous: ρ(v) ≥ 0, ∀v ≥ 0.

In this situation, Yi might be a random variable representing an individual asset

i. CVaRβ (x) is sublinear in x,

ii. CVaRβ (x) = c when L(x) = c ∈ R, and

iii. CVaRβ (x) ≤ CVaRβ (x′ ) when L(x) ≤ L(x′ ).

See [33] for the proof.

2.3 Risk-Return Optimization

If the risky returns R are assumed to follow a multivariate normal distribution

ri = µi + βi1 F1 + ... + βik Fk + ǫi , (2.4)

ρ (rp − E(rp )) ≤ ρ (r̃p − E(r̃p )) ⇐⇒ σp2 ≤ σ̃p2 .

In addition if ρ is translation invariant, the solution to the following risk-return

min σp2 min ρ(rp )

where µ0 is the desired return. Therefore, under this distributional assumption,

2.4 CVaR Optimization

then CVaR is expressed as a minimization through the following optimization

CVaRβ (x) = min Γβ (x, ζ), (2.6)

VaRβ (x) = lower endpoint of argminζ Γβ (x, ζ).

Equation (2.6) will be referred to as Uryasev’s formula. As a corollary, it can be

min CVaRβ (x) = min Γβ (x, ζ) (2.7)

The proofs of the above results are found in [33].

min CVaRβ (x) − λR(x), and (2.9)

min −R(x) subject to CVaRβ (x) ≤ λ, (2.10)

min Γβ (x, ζ) − λR(x),

min −R(x) subject to CVaRβi (x) ≤ λi , for i = 1, ..., I,

has the same optimal solution as

min −R(x) subject to Γβi (x, ζi ) ≤ λi , for i = 1, ..., I.

Portfolio Optimization and

3.1 1-stage Portfolio Optimization