Academia.eduAcademia.edu

On the coherence of expected shortfall

2012

Expected Shortfall (ES) in several variants has been proposed as remedy for the deficiencies of Value-at-Risk (VaR) which in general is not a coherent risk measure. In fact, most definitions of ES lead to the same results when applied to continuous loss distributions. Differences may appear when the underlying loss distributions have discontinuities. In this case even the coherence property of ES can get lost unless one took care of the details in its definition. We compare some of the definitions of Expected Shortfall, pointing out that there is one which is robust in the sense of yielding a coherent risk measure regardless of the underlying distributions. Moreover, this Expected Shortfall can be estimated effectively even in cases where the usual estimators for VaR fail.

arXiv:cond-mat/0104295v5 [cond-mat.stat-mech] 2 May 2002 On the coherence of Expected Shortfall Carlo Acerbi∗ Dirk Tasche† April 19, 2002 Abstract Expected Shortfall (ES) in several variants has been proposed as remedy for the deficiencies of Value-at-Risk (VaR) which in general is not a coherent risk measure. In fact, most definitions of ES lead to the same results when applied to continuous loss distributions. Differences may appear when the underlying loss distributions have discontinuities. In this case even the coherence property of ES can get lost unless one took care of the details in its definition. We compare some of the definitions of Expected Shortfall, pointing out that there is one which is robust in the sense of yielding a coherent risk measure regardless of the underlying distributions. Moreover, this Expected Shortfall can be estimated effectively even in cases where the usual estimators for VaR fail. Key words: Expected Shortfall; Risk measure; worst conditional expectation; tail conditional expectation; value-at-risk (VaR); conditional value-at-risk (CVaR); tail mean; coherence; quantile; sub-additivity. 1 Introduction Value-at-Risk (VaR) as a risk measure is heavily criticized for not being sub-additive (see [7] for an overview of the criticism). This means that the risk of a portfolio can be larger than the sum of the stand-alone risks of its components when measured by VaR (cf. [2], [3], [15], or [1]). Hence, managing risk by VaR may fail to stimulate diversification. Moreover, VaR does not take into account the severity of an incurred damage event. As a response to these deficiencies the notion of coherent risk measures was introduced in [2], [3], and [5]. An important example for a risk measure of this kind is the worst conditional expectation (WCE) (cf. Definition 5.2 in [3]). This notion is closely related to the tail conditional expectation (TCE) from Definition 5.1 in [3], but in general does not coincide with it (see section 5 below). Unfortunately, a somewhat misleading formulation in [2] suggests this coincidence to ∗ Abaxbank, Corso Monforte 34, 20122 Milano, Italy; E-mail: [email protected] Deutsche Bundesbank, Postfach 10 06 02, 60006 Frankfurt, Germany; E-mail: [email protected] The contents of this paper do not necessarily reflect opinions shared by Deutsche Bundesbank. † 1 be true. Meanwhile, several authors (e.g. [17], [13], or [1]) proposed modifications to TCE, this way increasing confusion since the relation of these modifications to TCE and WCE remained obscure to a certain degree. The identification of TCE and WCE is to a certain degree a temptation though the authors of [3] actually did their best to warn the reader. WCE is in fact coherent but very useful only in a theoretical setting since it requires the knowledge of the whole underlying probability space while TCE lends itself naturally to practical applications but it is not coherent (see Example 5.4 below). The goal to construct a risk measure which is both coherent and easy to compute and to estimate was however achieved in [1]. The definition of Expected Shortfall (ES) at a specified level α in [1] (Definition 2.6 below) is the literal mathematical transcription of the concept “average loss in the worst 100α% cases”. We rely on this definition of Expected Shortfall in the present paper, despite the fact that in the literature this term was already used sometimes in another meaning. With the paper at hand we strive primarily for making transparent the relations between the notions developed in [5], [3], [13], and [1]. We present four characterizations of Expected shortfall: as integral of all the quantiles below the corresponding level (eq. (3.3)), as limit in a tail strong law of large numbers (Proposition 4.1), as minimum of a certain functional introduced in [13] (Corollary 4.3 below), and as maximum of WCEs when the underlying probability space varies (Corollary 6.3). This way, we will show that the ES definition in [1] is complementary and even in some aspects superior to the other notions. Moreover, in a certain sense any law invariant coherent risk measure has a representation with ES as the main building block (see [11]). Some hints on the organization of the paper: In section 2 we give precise mathematical definitions to the five notions to be discussed. These are WCE, TCE, CVaR (conditional value-at-risk), ES, and its negative, the so-called α-tail mean (TM). Section 3 presents useful properties of α-tail mean and ES, namely the integral representation (3.3), continuity and monotonicity in the level α as well as coherence for ES. In section 4 we show first that α-tail mean arises naturally as limit of the average of the 100α% worst cases in a sample. Then we point out that in fact ES and CVaR are two different names for the same object. Section 5 is devoted to inequalities and examples clarifying the relations between ES, TCE, and WCE. In Section 6 we deal with the question how to state a general representation of ES in terms of WCE. Section 7 concludes the paper. 2 Basic definitions We have to arrange a minimum set of definitions to be consistent with the notions used in [5], [13], and [1]. Fix for this section some real-valued random variable X on a probability space (Ω, A, P). X is considered the random profit or loss of some asset or portfolio. For the purpose of this paper, we are mainly interested in losses, i.e. low values of X. By E[. . . ] we will denote 2 expectation with respect to P. Fix also some confidence level α ∈ (0, 1). We will often make use of the indicator function ( 1, a ∈ A (2.1) 1A (a) = 1A = 0 , a 6∈ A . Definition 2.1 (Quantiles) x(α) = qα (X) = inf{x ∈ R : P[X ≤ x] ≥ α} is the lower α-quantile of X, x(α) = q α (X) = inf{x ∈ R : P[X ≤ x] > α} is the upper α-quantile of X. We use the x-notation if the dependence on X is evident, otherwise the q-notion. ✷ Note that x(α) = sup{x ∈ R : P[X ≤ x] ≤ α}. From {x ∈ R : P[X ≤ x] > α} ⊂ {x ∈ R : P[X ≤ x] ≥ α} it is clear that x(α) ≤ x(α) . Moreover, it is easy to see that (2.2) x(α) = x(α) if and only if P[X ≤ x] = α for at most one x , and in case x(α) < x(α) (2.3) {x ∈ R : α = P[X ≤ x]} = ( [x(α) , x(α) ) , P[X = x(α) ] > 0 [x(α) , x(α) ] , P[X = x(α) ] = 0 . (2.2) and (2.3) explain why it is difficult to say that there is an obvious definition for value-atrisk (VaR). We join here [5] taking as VaRα the smallest value such that the probability of the absolute loss being at most this value is at least 1 − α. As this is not really comprehensible when said with words here is the formal definition: Definition 2.2 (Value-at-risk) VaRα = VaRα (X) = −x(α) = q1−α (−X) is the value-at-risk at level α of X. ✷ The definition of tail conditional expectation (TCE) given in [3], Definition 5.1, depends on the choice of quantile taken for VaR (and of some discount factor we neglect here for reasons of simplicity). But as there is a choice for VaR there is also a choice for TCE. That is why we consider a lower and an upper TCE. Denote the positive part of a number x by x+ = ( x, x > 0 and its negative part by x− = (−x)+ . 0, x ≤ 0, Definition 2.3 (Tail conditional expectations) Assume E[X − ] < ∞. Then TCEα = TCEα (X) = −E[X | X ≤ x(α) ] is the lower tail conditional expectation at level α of X. TCEα = TCEα (X) = −E[X | X ≤ x(α) ] is the upper tail conditional expectation at level α of X. ✷ 3 TCEα is (up to a discount factor) the tail conditional expectation from Definition 5.1 in [3]. “Lower” and “upper” here corresponds to the quantiles used for the definitions, but not to the proportion of the quantities. In fact, (2.4) TCEα ≥ TCEα is obvious. As Delbaen says in the proof of Theorem 6.10 in [5], TCEα in general does not define a subadditive risk measure (see Example 5.4 below). For this reason, in [3], Definition 5.2, the worst conditional expectation (WCE) was introduced. Here is the definition (up to a discount factor) in our terms: Definition 2.4 (Worst conditional expectation) Assume E[X − ] < ∞. Then WCEα = WCEα (X) = − inf{E[X | A] : A ∈ A, P[A] > α} is the worst conditional expectation at level α of X. ✷ Observe that under the assumption E[X − ] < ∞ the value of WCEα is always finite since then limt→∞ P[X ≤ x(α) + t] = 1 implies that there is some event A = {X ≤ x(α) + t} with P[A] > α and E[ |X| 1A ] < ∞. We will see in section 5 that Definition 2.4 has to be treated with care nevertheless because the notion WCEα (X) hides the fact that it depends not only on the distribution of X but also on the structure of the underlying probability space. From the definition it is clear that for any random variables X and Y on the same probability space WCEα (X + Y ) ≤ WCEα (X) + WCEα (Y ) , i.e. WCE is sub-additive. Moreover, Proposition 5.1 in [3] says WCEα ≥ TCEα . Hence WCEα is a majorant to TCEα ≥ VaRα . It is in fact the smallest coherent risk measure dominating VaRα and only depending on X through its distribution if the underlying probability space is “rich” enough (see Theorem 6.10 in [5] for details). This is a nice result, but to a certain degree unsatisfactory since the infimum does not seem too handy. This observation might have been the reason for introducing the conditional value-at-risk (CVaR) in [17] (see also the references therein) and [13]. CVaR can be used as a base for very efficient optimization procedures. We quote here, up to the sign of the random variable and the corresponding change from α to 1 − α (cf. Definition 2.2), equation (1.2) from [13]. Definition 2.5 (Conditional value-at-risk) o n −] − s : s ∈ R is the condiAssume E[X − ] < ∞. Then CVaRα = CVaRα (X) = inf E[(X−s) α tional value-at-risk at level α of X. ✷ Note that by Proposition 4.2 and (4.9), CVaR is well-defined. But beware: Pflug states in equation (1.3) of [13] (translated to our setting, i.e. −X instead of Y and 1 − α instead of α) 4 the relation CVaRα (X) = TCEα (X) , without any assumption. Corollary 5.3 in connection with Corollary 4.3 shows that this is only true if P[X < x(α) ] = 0, P[X = x(α) ] = 0 or P[X < x(α) ] > 0, P[X ≤ x(α) ] = α (in particular if the distribution of X is continuous). The last definition we need is that of α-tail mean from [1]. In order to make it comparable to the risk measures defined so far, we define it in two variants: the tail mean which is likely to be negative but appears in a statistical context (cf. Proposition 4.1 below), and the Expected Shortfall representing potential loss as in most cases positive number. The advantage of tail mean is the explicit representation allowing an easy proof of super-additivity (hence sub-additivity for its negative) independent of the distributions of the underlying random variables (cf. the theorem in the appendix of [1]). We will see below (Corollary 4.3) that the Expected Shortfall is in fact identical with CVaR and enjoys properties as coherence and continuity and monotonicity in the confidence level (section 3). Moreover, it is in a specific sense the largest possible value WCE can take (Corollary 6.3). Definition 2.6 (Tail mean and Expected Shortfall) Assume E[X − ] < ∞. Then  x̄(α) = TMα (X) = α−1 E[X 1{X≤x(α) } ] + x(α) (α − P[X ≤ x(α) ]) is the α-tail mean at level α of X. ESα = ESα (X) = −x̄(α) is the Expected Shortfall (ES) at level α of X. ✷ Note that by Corollary 4.3 α-tail mean and ESα only depend on the distribution of X and the level α but not on a particular definition of quantile. 3 Useful properties of tail mean and Expected Shortfall The most important property of ES (Definition 2.6) might be its coherence. Proposition 3.1 (Coherence of ES) Let α ∈ (0, 1) be fixed. Consider a set V of real-valued random variables on some probability space (Ω, A, P) such that E[X − ] < ∞ for all X ∈ V . Then ρ : V → R with ρ(X) = ESα (X) for X ∈ V is a coherent risk measure in the sense of Definition 2.1 in [5], i.e. it is (i) monotonous: X ∈ V, X ≥ 0 (ii) sub-additive: X, Y, X + Y ∈ V ⇒ ⇒ ρ(X) ≤ 0, ρ(X + Y ) ≤ ρ(X) + ρ(Y ), (iii) positively homogeneous: X ∈ V, h > 0, h X ∈ V (iv) translation invariant: X ∈ V, a ∈ R ⇒ ⇒ ρ(h X) = h ρ(X), and ρ(X + a) = ρ(X) − a. 5 Proof. See Proposition A.1 in the Appendix for an elementary proof of (ii). To check (i), (iii) and (iv) is an easy exercise (cf. also Proposition 3.2). ✷ In the financial industry there is a growing necessity to deal with random variables with discontinuous distributions. Examples are portfolios of not-traded loans (purely discrete distributions) or portfolios containing derivatives (mixtures of continuous and discrete distributions). One problem with tail risk measures like VaR, TCE, and WCE, when applied to discontinuous distributions, may be their sensitivity to small changes in the confidence level α. In other words, they are not in general continuous with respect to the confidence level α (see Example 5.4). In contrast, ESα is continuous with respect to α. Hence, regardless of the underlying distributions, one can be sure that the risk measured by ESα will not change dramatically when there is a switch in the confidence level by – say – some base points. We are going to derive this insensitivity property in Corollary 3.3 below as a consequence of an alternative representation of tail mean. This integral representation (Proposition 3.2) – which was already given in [4] for the case of continuous distributions – might be of interest on its own. Another – almost self-evident – important property of ESα is its monotonicity in α. The smaller the level α the greater is the risk. We show this formally in Proposition 3.4. Proposition 3.2 If X is a real-valued random variable on a probability space (Ω, A, P) with E[X − ] < ∞ and α ∈ (0, 1) is fixed, then Z α −1 x̄(α) = α x(u) d u , 0 with x̄(α) and x(u) as in Definitions 2.1 and 2.6, respectively. Proof. By switching to another probability space if necessary, we can assume that there is a real random variable U on (Ω, A, P) that is uniformly distributed on (0, 1), i.e. P[U ≤ u] = u, u ∈ (0, 1). It is well-known that then the random variable Z = x(U ) has the same distribution as X. Since u 7→ x(u) is non-decreasing we have (3.1) {U ≤ α} ⊂ {Z ≤ x(α) } and (3.2) {U > α} ∩ {Z ≤ x(α) } ⊂ {Z = x(α) } . By (3.1) and (3.2) we obtain Z α x(u) d u = E[Z 1{U ≤α} ] 0 = E[Z 1{Z≤x(α) } ] − E[Z 1{U >α}∩{Z≤x(α) } ]   = E[X 1{X≤x(α) } ] + x(α) α − P[X ≤ x(α) ] . Dividing by α now yields the assertion. 6 ✷ Note that by definition of Expected Shortfall, Proposition 3.2 implies the representation Z α −1 ESα (X) = − α (3.3) qu (X) d u . 0 Eq. (3.3) shows that ES is the coherent risk measure used in [11] as main building block for the representation of law invariant coherent risk measures. Corollary 3.3 If X is a real-valued random variable with E[X − ] < ∞, then the mappings α 7→ x̄α and α 7→ ESα are continuous on (0, 1). Proof. Immediate from Proposition 3.2 and (3.3). ✷ For some of the results below and in particular the subsequent proposition on monotonicity of the tail mean and ES, a further representation for x(α) is useful (cf. Appendix in [1]). Let for x∈R   1{X≤x} , if P[X = x] = 0 (α) (3.4) 1{X≤x} = α−P[X≤x]  1 {X≤x} + P[X=x] 1{X=x} , if P[X = x] > 0 . Then a short calculation shows (α) (3.5) 1{X≤x h (3.6) (α) } ∈ [0, 1] , i (α) E 1{X≤x } = α , and (α) i h (α) −1 α E X 1{X≤x } = x(α) . (3.7) (α) Proposition 3.4 If X is a real-valued random variable with E[X − ] < ∞, then for any α ∈ (0, 1) and any ǫ > 0 with α + ǫ < 1 we have the following inequalities: x(α+ǫ) ≥ x(α) and ESα+ǫ (X) ≤ ESα (X) . Proof. We adopt the representation (3.7). This yields i h  (α+ǫ) −1 (α) − α 1 x(α+ǫ) − x(α) = E X (α + ǫ)−1 1{X≤x {X≤x(α) } (α+ǫ) } i h  (α) (α+ǫ) − (α + ǫ) 1 = (α(α + ǫ))−1 E X α 1{X≤x {X≤x(α) } (α+ǫ) } i  h (α) (α+ǫ) − (α + ǫ) 1 ≥ (α(α + ǫ))−1 E x(α) α 1{X≤x {X≤x } } (α) (α+ǫ) x(α) (α) − (α + ǫ) E 1{X≤x } (α) (α+ǫ) } α(α + ǫ) x(α) (α (α + ǫ) − (α + ǫ) α) = α(α + ǫ) = 0. =  h (α+ǫ) α E 1{X≤x 7 i h i The inequality is due to the fact that by (3.5) (α+ǫ) α 1{X≤x (α+ǫ) } 4 − (α + (α) ǫ) 1{X≤x } (α) ( ≤ 0, ≥ 0, if X < x(α) if X > x(α) . ✷ Motivation for tail mean and Expected Shortfall Assume that we want to estimate the lower α-quantile x(α) of some random variable X. Let some sample (X1 , . . . , Xn ), drawn from independent copies of X, be given. Denote by X1:n ≤ . . . ≤ Xn:n the components of the ordered n-tuple (X1 , . . . , Xn ). Denote by ⌊x⌋ the integer part of the number x ∈ R, hence ⌊x⌋ = max{n ∈ Z : n ≤ x} . Then the order statistic X⌊nα⌋:n appears as natural estimator for x(α) . Nevertheless, it is well known that in case of a non-unique quantile (i.e. x(α) < x(α) ) the quantity X⌊nα⌋:n does not converge to x(α) . This follows for instance from Theorem 1 in [8] which says that 1 = P[X⌊nα⌋:n ≤ x(α) infinitely often] = P[X⌊nα⌋:n ≥ x(α) infinitely often] . Surprisingly, we get a well-determined limit when we replace the single order statistic by an average over the left tail of the sample. Recall the definition (2.1) of an indicator function. Proposition 4.1 Let α ∈ (0, 1) be fixed, X a real random variable with E[X − ] < ∞ and (X1 , X2 , . . . ) an independent sequence of random variables with the same distribution as X. Then with probability 1 (4.1) lim n→∞ ⌊nα⌋ P Xi:n i=1 = ⌊nα⌋ x̄(α) . If X is integrable, then the convergence in (4.1) holds in L1 , too. Proof. Due to Proposition 3.2, the “with probability 1” part of Proposition 4.1 is essentially a special case of Theorem 3.1 in [18] with 0 = t0 < α = t1 < t2 = 1, J(t) = 1(0,α] (t), JN (t) = 1(0, ⌊Nα⌋+1 ] (t), g(t) = F −1 (t), and p1 = p2 = ∞. Concerning the L1 -convergence note N that ⌊nα⌋ n X X |Xi |. Xi:n ≤ i=1 i=1 Pn By the strong law of large numbers i=1 |Xi | converges in L1 . This implies uniform inteP P ⌊nα⌋ n grability for n−1 i=1 |Xi | and for n−1 i=1 Xi:n . Together with the already proven almost sure convergence this implies the assertion. ✷ n−1 8 To see how a direct proof of the almost sure convergence in Proposition 4.1 would work consider the following heuristic computation. Observe first that ⌊nα⌋ P Xi:n i=1 ⌊nα⌋ = = (4.2) = n X 1 ⌊nα⌋ n X Xi:n 1{Xi:n ≤X⌊nα⌋:n} + 1 ⌊nα⌋ n X ! n   X Xi 1{Xi ≤X⌊nα⌋:n} + X⌊nα⌋:n 1{1,... ,⌊nα⌋} (i) − 1{Xi:n ≤X⌊nα⌋:n } 1 ⌊nα⌋ n X Xi:n 1{1,... ,⌊nα⌋} (i) − 1{Xi:n ≤X⌊nα⌋:n } i=1 i=1 i=1   ! i=1  Xi 1{Xi ≤X⌊nα⌋:n} + X⌊nα⌋:n ⌊nα⌋ − i=1 n X 1{Xi ≤X⌊nα⌋:n } i=1 !  . If we now had (4.3) lim X⌊nα⌋:n n→∞ = x(α) , with probability 1, in connection with limn→∞ n/⌊nα⌋ = 1/α it would be plausible to obtain (4.1). Unfortunately (4.3) is not true in general, but only (4.4) lim inf X⌊nα⌋:n n→∞ = x(α) and lim sup X⌊nα⌋:n = x(α) . n→∞ Nevertheless the proof could be completed on the base of (4.2) by using (4.4) together with the Glivenko-Cantelli theorem and Corollary 4.3 below. Proposition 4.1 validates the interpretation given to α–tail mean in [1] as mean of the worst 100α% cases. This concept, which seems very natural from an insurance or risk management point of view, has so far appeared in the literature by different kinds of conditional expectation beyond VaR which is a different concept for discrete distributions. “Tail Conditional expectation”, “worst conditional expectation”, “conditional value at risk” all bear also in their name the fact that they are conditional expected values of the random variable X (note that concerning CVaR, by Corollary 4.3 below this is a misinterpretation). For TCEα , for instance, the natural estimator is not given by the one analyzed in (4.1) or its negative, but rather by Pn i=1 Xi 1{Xi ≤X⌊nα⌋:n } − Pn (4.5) i=1 1{Xi ≤X⌊nα⌋:n } which however has problems of convergence in case x(α) < x(α) . This is the reason why we avoid the term ”conditional” in our definition of α–tail mean. In fact, it is not very hard to see (cf. Example 5.4 below) that α–tail mean does not admit a general representation in terms of a conditional expectation of X given some event A ∈ σ(X) (i.e. some event only depending on X). Hence it is not possible to give a definition of the type (4.6) x̄(α) = E[X|A] for some A ∈ σ(X) , 9 unless the event A is chosen in a σ-algebra A ⊃ σ(X) on an artificial new probability space (see Corollary 6.2 below). In order to make visible the coincidence of CVaR and tail mean, the following proposition collects some facts on quantiles which are well-known in probability theory (cf. Exercise 3 in ch. 1 of [9] or Problem 25.9 of [10] for the here cited version): Proposition 4.2 Let X be a real integrable random variable on some probability space (Ω, A, P). Fix α ∈ (0, 1) and define the function Hα : R → [0, ∞) by Hα (s) (4.7) = α E[(X − s)+ ] + (1 − α) E[(X − s)− ] . Then the function Hα is convex (and hence continuous) with lim Hα (s) = ∞. The set Mα of |s|→∞ minimizers to Hα is a compact interval, namely Mα = [x(α) , x(α) ] (4.8) = {s ∈ R : P[X < s] ≤ α ≤ P[X ≤ s]} . ✷ Note the following equivalent representations for Hα :   E[(X − s)− ] (4.9) −s Hα (s) = α E[X] + α α   E[X 1{X≤s} ] α − P[X ≤ s] = α E[X] − α +s (4.10) . α α From Definitions 2.5 and 2.6 for CVaR and ES, respectively, and by (4.10), in connection with Proposition 4.2, we obtain the following corollary to the proposition. Corollary 4.3 Let X be a real integrable random variable on some probability space (Ω, A, P) and α ∈ (0, 1) be fixed. Then ESα (X) = CVaRα (X) (4.11)  = − α−1 E[X 1{X≤s} ] + s (α − P[X ≤ s]) , s ∈ [x(α) , x(α) ] . ✷ A further representation of ES or CVaR, respectively, as expectation of a suitably modified tail distribution is given in the recent research report [14] (cf. Def. 3 therein). In Definitions 2.5 and 2.6 only E[X − ] < ∞ is required for X. Indeed, this integrability condition would suffice to guarantee (4.11). We formulated Corollary 4.3 with full integrability of X because we wanted to rely on Proposition 4.2 for the proof. Note that by a simple calculation one can show that (4.11) is equivalent to  (4.12) ESα (X) = − α−1 E[X 1{X<s} ] + s (α − P[X < s]) , s ∈ [x(α) , x(α) ] . By (4.12) we see that ES coincides with the coherent risk measure considered in Example 4 of [6] (already mentioned in Example 4.2 of [5]). 10 5 Inequalities and counter-examples In this section we compare the Expected Shortfall with the risk measures TCE and WCE defined in section 2. Moreover, we present an example showing that VaR and TCE are not sub-additive in general. By the same example we show that there is not a clear relationship between WCE and lower TCE. We start with a result in the spirit of the Neyman-Pearson lemma. Proposition 5.1 Let α ∈ (0, 1) be fixed and X be a real-valued random variable on some probability space (Ω, A, P). Suppose that there is some function f : R → R such that E[(f ◦ X)− ] < ∞, f (x) ≤ f (x(α) ) for x < x(α) , and f (x) ≥ f (x(α) ) for x > x(α) . Let A ∈ A be an event with P[A] ≥ α and E[ |f ◦ X| 1A ] < ∞. Then (i) TMα (f ◦ X) ≤ E[f ◦ X | A] , (ii) TMα (f ◦ X) = E[f ◦ X | A] if P[A ∩ {X > x(α) }] = 0 and (5.1) P[X < x(α) ] = 0 or (5.2) P[X < x(α) ] > 0, P[Ω\A ∩ {X < x(α) }] = 0, and P[A] = α , (iii) if f (x) < f (x(α) ) for x < x(α) and f (x) > f (x(α) ) for x > x(α) , then TMα (f ◦ X) = E[f ◦ X | A] implies P[A ∩ {X > x(α) }] = 0 and either (5.1) or (5.2). Proof. Note that by assumption {X ≤ x(α) } ⊂ {f ◦ X ≤ f (x(α) )} and {X < x(α) } ⊃ {f ◦ X < f (x(α) )} . Hence we see from (4.8) that P[f ◦ X ≤ f (x(α) )] ≥ α and P[f ◦ X < f (x(α) )] ≤ α and therefore (5.3) qα (f ◦ X) ≤ f (x(α) ) ≤ q α (f ◦ X) . Moreover, the assumption implies (5.4) {f ◦ X ≤ f (x(α) )}\{X ≤ x(α) } 11 ⊂ {f ◦ X = f (x(α) )} . By Corollary 4.3, (5.3), (5.4), (3.6), and (3.7), we can calculate similarly to the proof of Proposition 3.4 i h  (α) E[f ◦ X|A] − TMα (f ◦ X) = E f ◦ X P [A]−1 1A − α−1 1{f ◦X≤f (x )} (α) i h  (α) = E f ◦ X P [A]−1 1A − α−1 1{X≤x } (α)  i h (α) = (α P [A])−1 f (x(α) )E α 1A − P [A] 1{X≤x } (α) h  + E (f ◦ X − f (x(α) )) α 1A − (α) P [A] 1{X≤x } (α) i  h  (α) = (α P [A])−1 E (f ◦ X − f (x(α) )) α 1A − P [A] 1{X≤x (5.5) ≥ 0. (α) } i Here, we obtain inequality (5.5) from the assumption on f since ( ≤ 0 , if X < x(α) (α) (5.6) α 1A − P[A] 1{X≤x } (α) ≥ 0 , if X > x(α) . This proves (i). The sufficiency and necessity respectively of the conditions in (ii) and (iii) for equality in (5.5) are easily obtained by careful inspection of (5.5). ✷ Note that the condition (5.7) P[A ∩ {X > x(α) }] = 0 and P[Ω\A ∩ {X < x(α) }] = 0 , appearing in (ii) and (iii) of Proposition 5.1, means up to set differences of probability 0 that (5.8) {X < x(α) } ⊂ A ⊂ {X ≤ x(α) } . In particular, (5.7) is implied by (5.8). The proof of Proposition 5.1 is the hardest work in this section. Equipped with its result we are in a position to derive without effort a couple of conclusions pointing out the relations between TCE, WCE and ES. Recall ESα = −TMα . Corollary 5.2 Let α ∈ (0, 1) and X a real-valued random variable on some probability space (Ω, A, P) with E[X − ] < ∞. Then (5.9) TCEα (X) ≤ TCEα (X) ≤ ESα (X) , (5.10) TCEα (X) ≤ WCEα (X) ≤ ESα (X) . and Proof. The first inequality in (5.9) is obvious (formally it follows from Lemma 5.1 in [16]). The second follows from Proposition 5.1 (i) by setting f (x) = x, A = {X ≤ x(α) }, and observing 12 P[X ≤ x(α) ] ≥ α. The first inequality in (5.10) was proven in Proposition 5.1 of [3]. The second follows again from Proposition 5.1 (i) since all the events in the definition of WCE have probabilities > α. ✷ The following corollary to Proposition 5.1 presents in particular in (i) a first sufficient condition for WCE and ES to coincide, namely continuity of the distribution of X. Corollary 5.3 Let α and X be as in Corollary 5.2. Then (i) P[X ≤ x(α) ] = α, P[X < x(α) ] > 0 or P[X ≤ x(α) , X 6= x(α) ] = 0 (5.11) if and only if ESα (X) = WCEα (X) = TCEα (X) = TCEα (X) . In particular, (5.11) holds if the distribution of X is continuous, i.e. P[X = x] = 0 for all x ∈ R. (ii) P[X ≤ x(α) ] = α or P[X < x(α) ] = 0 if and only if ESα (X) = TCEα (X). Proof. Concerning (i) apply Proposition 5.1 (ii) and (iii) with A = {X ≤ x(α) } and Corollary 5.2. In order to obtain (ii) apply Proposition 5.1 (ii) and (iii) with A = {X ≤ x(α) }. ✷ Corollary 5.2 leaves open the relation between TCEα (X) and WCEα (X). The implication (5.12) P[X ≤ x(α) ] > α ⇒ TCEα (X) ≤ WCEα (X) ⇒ TCEα (X) ≥ WCEα (X) . is obvious. Corollary 5.3 (ii) shows that (5.13) P[X ≤ x(α) ] = α The following example shows that all the inequalities between TCE, WCE, and ES in (5.9), (5.10), (5.12), and (5.13) can be strict. Moreover, it shows that none of the quantities −qα, VaRα , TCEα , or TCEα defines a sub-additive risk measure in general. Example 5.4 Consider the probability space (Ω, A, P) with Ω = {ω1 , ω2 , ω3 }, A the set of all subsets of Ω and P specified by P[{ω1 }] = P[{ω2 }] = p, P[{ω3 }] = 1 − 2 p , and choose 0 < p < 13 . Fix some positive number N and let Xi , i = 1, 2, be two random variables defined on (Ω, A, P) with values ( −N , if i = j Xi (ωj ) = 0 , otherwise. Choose α such that 0 < α < 2 p. Then it is straightforward to obtain Table 1 with the values of the risk measures interesting to us. 13 p < α < 2p p=α p>α Risk Measure X1,2 X1 + X2 X1,2 X1 + X2 X1,2 X1 + X2 −qα VaRα TCEα TCEα WCEα ESα 0 0 Np Np N/2 N p/α N N N N N N N 0 Np N N/2 N N N N N N N N N N N N N N N N N N N Table 1: Values of risk measures for Example 5.4. In case p < α < 2 p we see from Table 1 that −qα (X1 ) − qα (X2 ) < −qα(X1 + X2 ) VaRα (X1 ) + VaRα (X2 ) < VaRα (X1 + X2 ) TCEα (X1 ) + TCEα (X2 ) < TCEα (X1 + X2 ) TCEα (X1 ) + TCEα (X2 ) < TCEα (X1 + X2 ) . These inequalities show that none of the notions −qα , VaRα , TCEα , or TCEα can be used to define a sub-additive risk measure. In case p < α < 2 p we have also TCEα (X1 ) < ESα (X1 ) TCEα (X1 ) = TCEα (X1 ) < WCEα (X1 ) (5.14) WCEα (X1 ) < ESα (X1 ) . Hence the second inequalities in (5.9), (5.10), and (5.12) may be strict, as can be the first inequality in (5.10). In case p = α we have from Table 1 that TCEα (X1 ) < TCEα (X1 ) and TCEα (X1 ) > WCEα (X1 ) . Thus, also the first inequality in (5.9) and the inequality in (5.13) can be strict. In particular, we see that there is not any clear relationship between TCEα and WCE. Beside the inequalities, from the comparison with the results in the region p > α, we get an example for the fact that all the measures but ES may have discontinuities in α. Moreover, in case p < α we have a stronger version of (5.14), namely − inf{E[X1 | A] : A ∈ A, P[A] ≥ α} < ESα (X1 ) , which shows that even if one replaces “>” by “≥” in Definition 2.4, strict inequality may appear in the relation between WCE and ES. ✷ We finally observe that Example 5.4 is not so academic as it may seem at first glance since the Xi ’s may be figured out as two risky bonds of nominal N with non–overlapping default states ωi of probability p. 14 6 Representing ES in terms of WCE By Example 5.4 we know that WCE and ES may differ in general. Nevertheless, we are going to show in the last part of the paper that this phenomenon can only occur when the underlying probability space is too “small” in the sense of not allowing a suitable representation of the random variable under consideration as function of a continuous random variable. Moreover, as long as only finitely many random variables are under consideration it is always possible to switch to a “larger” probability space in order to make WCE and ES coincide. Finally, we state a general representation of ES in terms of related WCEs. Proposition 6.1 Let X and Y be a real-valued random variables on a probability space (Ω, A, P) such that E[Y − ] < ∞. Fix some α ∈ (0, 1). Assume that Y is given by Y = f ◦X where f satisfies f (x) ≤ f (x(α) ) for x < x(α) , and f (x) ≥ f (x(α) ) for x > x(α) . (i) If P[X ≤ x(α) ] = α then ESα (Y ) = − inf A∈A, P[A]≥α E[Y | A] . (ii) If the distribution function of X is continuous then also ESα (Y ) = WCEα (Y ) . Proof. Concerning (i), by Proposition 5.1 (i) we only have to show (6.1) TMα (Y ) = E[Y | X ≤ x(α) ] . With the choice A = {X ≤ x(α) } this follows from Proposition 5.1 (ii). Concerning Proposition 6.1 (ii), by (6.1), we have to show that there is a sequence (An )n∈N in A with P[An ] > α for all n ∈ N such that lim E[Y | An ] = E[Y | X ≤ x(α) ] . n→∞ By continuity of the distribution of X and integrability of Y − we obtain such a sequence with the definition An = {X ≤ x(α) + 1/n} . ✷ Corollary 6.2 Let (X1 , . . . , Xd ) be an Rd -valued random vector on a probability space (Ω, A, P) such that E[Xi− ] < ∞, i = 1, . . . , d. Fix α ∈ (0, 1). Then there is a random vector (X1′ , . . . , Xd′ ) on some probability space (Ω′ , A′ , P′ ) with the following two properties: (i) The distributions of (X1 , . . . , Xd ) and (X1′ , . . . , Xd′ ) are equal, i.e. P[X1 ≤ x1 , . . . , Xd ≤ xd ] = P′ [X1′ ≤ x1 , . . . , Xd′ ≤ xd ] 15 for all (x1 , . . . , xd ) ∈ R . (ii) Worst conditional expectation and Expected Shortfall coincide for all i = 1, . . . , d, i.e. WCEα (Xi′ ) = ESα (Xi′ ) , i = 1, . . . , d . ✷ Proof. By Sklar’s theorem (cf. Theorem 2.10.9 in [12]) we get the existence of a random vector (U1 , . . . , Ud ) where each Ui is uniformly distributed on (0, 1) such that (i) holds with Xi′ = qUi (Xi ), i = 1, . . . , d. Since qα is non-decreasing in α the assertion now follows from Proposition 6.1. ✷ Corollary 6.2 yields another proof for the sub-additivity of Expected Shortfall: in order to prove ESα (X)+ESα (Y ) ≥ ESα (X +Y ) apply the corollary to the underlying random vector (X, Y, X + Y ). As a final consequence of Corollary 5.2 and Corollary 6.2 we note: Corollary 6.3 Let X be a real-valued random variable on some probability space (Ω, A, P) with E[X − ] < ∞. Fix α ∈ (0, 1). Then n ESα (X) = max WCEα (X ′ ) : X ′ random variable on (Ω′ , A′ , P′ ) with o P′ [X ′ ≤ x] = P[X ≤ x] for all x ∈ R , where the maximum is taken over all random variables X ′ on probability spaces (Ω′ , A′ , P′ ) such that the distributions of X and X ′ are equal. ✷ Corollary 6.3 shows that Expected Shortfall in the sense of Definition 2.6 may be considered a robust version of worst conditional expectation (Definition 2.4), making the latter insensitive to the underlying probability space. 7 Conclusion In the paper at hand we have shown that simply taking a conditional expectation of losses beyond VaR can fail to yield a coherent risk measure when there are discontinuities in the loss distributions. Already existing definitions for some kind of expected shortfall, redressing this drawback, as those in [3] or [13], did not provide representations suitable for efficient computation and estimation in the general case. We have clarified the relations between these definitions and the explicit one from [1], thereby pointing out that it is the definition which is most appropriate for practical purposes. 16 A Appendix: Subadditivity of Expected Shortfall We give here for the sake of completeness the proof of subadditivity for expected shortfall which was originally given in Appendix A of [1]. For the proof it is convenient to adopt the representation of eq. (3.7) for the Tail Mean and write the Expected Shortfall as (A.1) ESα (Y ) = − 1 (α) E[X 1{X≤x } ] (α) α (α) with the function 1{X≤s} defined in (3.4). Proposition A.1 (Subadditivity of Expected Shortfall) Given two random variables X and Y with E[X − ] < ∞ and E[Y − ] < ∞ the following inequality holds: (A.2) ESα (X + Y ) ≤ ESα (X) + ESα (Y ) for any α ∈ (0, 1] Proof. Defining Z = X + Y , we obtain by virtue of (3.6) (A.3)  α ESα (X) + ESα (Y ) − ESα (Z) = h (α) = E Z 1{Z≤z (α) (α) (α) h  (α) = E X 1{Z≤z } − X 1{X≤x (α) } − Y 1{Y ≤y (α) (α) h (α) ≥ x(α) E 1{Z≤z } − 1{X≤x (α) }  +Y i h (α) + y(α) E 1{Z≤z (α) (α) } − 1{X≤x (α) } (α) }  (α) 1{Z≤z i (α) (α) } − 1{Y ≤y (α) } i (α) (α) } − 1{Y ≤y (α) } i = x(α) (α − α) + y(α) (α − α) = 0 which proves the thesis. In the inequality above we used the fact that  (α) (α)  if X > x(α) − 1{X≤x } ≥ 0 1   {Z≤z(α) } (α) (A.4)   (α)  1(α) if X < x(α) {Z≤z } − 1{X≤x } ≤ 0 (α) (α) which in turn is a consequence of (3.4) and (3.5) 17 ✷ References [1] Acerbi, C., Nordio, C., Sirtori, C. (2001) Expected Shortfall as a Tool for Financial Risk Management. Working paper. http://www.gloriamundi.org/var/wps.html [2] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D. (1997) Thinking coherently. RISK 10(11). [3] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D. (1999) Coherent measures of risk. Math. Fin. 9(3), 203–228. [4] Bertsimas, D., Lauprete, G.J., Samarov, A. (2000) Shortfall as a risk measure: properties, optimization and applications. Working paper, Sloan School of Management, MIT, Cambridge. [5] Delbaen, F. (1998) Coherent risk measures on general probability spaces. Working paper, ETH Zürich. http://www.math.ethz.ch/∼delbaen/ [6] Delbaen, F. (2001) Coherent risk measures. Lecture Notes, Scuola Normale Superiore di Pisa. [7] Embrechts, P. (2000) Extreme Value Theory: Potential and Limitations as an Integrated Risk Management Tool. Working paper, ETH Zürich. http://www.math.ethz.ch/∼embrechts/ [8] Feldman, D., Tucker, H.G. (1966) Estimation of non-unique quantiles. Ann. Math. Stat. 37, 451–457. [9] Ferguson, T.G. (1967) Mathematical Statistics: a Decision Theoretic Approach. Academic Press: London. [10] Hinderer, K. (1972) Grundbegriffe der Wahrscheinlichkeitstheorie (Fundamental terms of probability theory). Springer: Berlin. [11] Kusuoka, S. (2001) On law invariant coherent risk measures. In Advances in Mathematical Economics, volume 3, pages 83–95. Springer: Tokyo. [12] Nelsen, R.B. (1999) An introduction to copulas. Springer: New York. [13] Pflug, G. (2000) Some remarks on the value-at-risk and the conditional value-at-risk. In, Uryasev, S. (Editor). 2000. Probabilistic Constrained Optimization: Methodology and Applications. Kluwer Academic Publishers. http://www.gloriamundi.org/var/pub.html [14] Rockafellar, R.T., Uryasev, S. (2001) Conditional value-at-risk for general loss distributions. Research report 2001-5, ISE Dept., University of Florida. http://www.ise.ufl.edu/uryasev/pubs.html 18 [15] Rootzén, H., Klüppelberg, C. (1999) A single number can’t hedge against economic catastrophes. Ambio 28(6), 550–555. Royal Swedish Academy of Sciences. [16] Tasche, D. (2000) Conditional expectation as quantile derivative. Working paper, TU München. http://www.ma.tum.de/stat/ [17] Uryasev, S. (2000) Conditional Value-at-Risk: Optimization Algorithms and Applications. Financial Engineering News 2 (3). http://www.gloriamundi.org/var/pub.html [18] Van Zwet, W.R. (1980) A strong law for linear functions of order statistics. Ann. Probab. 8, 986–990. 19