Jorion 1986
Jorion 1986
Jorion 1986
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Cambridge University Press and University of Washington School of Business Administration are collaborating
with JSTOR to digitize, preserve and extend access to The Journal of Financial and Quantitative Analysis.
http://www.jstor.org
Philippe Jorion*
Abstract
I. Introduction ,.
, medio .,
In virtus
279
0 = ?00] ?
(2) xtm.Ey[U(z)\
This approach obviously ignores the issue of estimation risk, or parameter uncer?
tainty.
The Bayesian solution to this problem, as first suggested by Zellner and
Chetty [33], is to define optimal portfolio choice in terms of the predictive den?
sity function. The latter is obtained after integrating out the unknown parameter
0, which explicitly takes into account uncertainty about 0. The investor's prob?
lem can be described as the maximization of the unconditional expected utility of
his portfolio
,with
max^[^|0[f/(z)| 0]]
= dz.
JU(Z) jp(z\ 0)/?(e| y9IQ)dO
where p(Q \y,/0) is the posterior density function of 0, given the data and the
prior information /0,
As Klein and Bawa [22] have shown, this approach is optimal according to the
expected utility von Neumann-Morgenstern axioms.
In a mean-variance framework the objective function can be reduced to a
derived utility function
(4) EU(z) =
F^z,ctI),
where |xz = q'\k and u2 = q' 2 q are the expected return and variance of
the portfolio. The control problem is to choose the vector of investment propor?
tions q so as to maximize expected utility, subject to the constraint that the
weigriFsmust sum to one, for instance.
If the distribution moments 0 = (|x,2) are known, the choice must be opti?
mal
= ^ =
(5)F(jx*,az,2*) F(2*(6) | ?,2) F(q*'?,q*'Zq*) F]MAX *
On the other hand, if the parameters 0 are unknown, the portfolio choice g will
be made on the basis of the sample estimate 0(y). The expected utility, measured
in terms of the true underlying distribution, will necessarily be lower than before
-
F(i(hl)) | H:>2) F(q'?,q'M) ^ F,MAX "
Figure 1 illustrates the utility loss due to estimation risk. If the investor
knew the true parameters, he would choose the portfolio represented by point A,
where the weights q* are optimal relative to the true frontier (the solid line).
Unfortunately, he only observes sample estimates, and selects portfolio B, with
composition q, which is optimal relative to the estimated frontier (the dashed
line). However, this choice q is suboptimal relative to the true parameters: for
point C, F(q) ^ FMAX. The difference between these values can be attributed to
estimation risk; it can also be expressed in risk-free equivalent return, by trans-
forming the level of expected utilityF into a risk-freerate R
(7) = F(R>v ?
F(^^T)
Various estimators 0(y) imply various portfolio choices q(Q(y)) and, thus, various
losses L(0,0,(y)). This leads to a well-defined loss function for the estimator 0(y)
viewed as a function t(-) ofthe data:
Return
Expected
Standard
Deviation
FIGURE 1
Portfolio
Choice withEstimationError
as such. In sampling theory, the risk function for an estimator r(*) is defined as
the expected loss over repeated samples
(8) = 0)Jy.
*,(fi) J^(e,e(y))/(y|
A decision rule t0(-) is said to be inadmissible if there exists another rule tx(-)
with at least equal and sometimes lower risk for any value of the true unknown
parameter 0.
Thus, a reasonable minimum requirement for any estimator is admissibility.
The central thesis of this paper is that the usual sample mean is not admissible for
portfolio estimation.
where the covariance matrix is assumed known. For N greater than 2, Efron and
Morris [16], generalizing Stein's [29] result, showed that the maximum-likeli-
hood estimator (lML(y), which is also the vector of sample means Y, is inadmissi?
ble relative to a quadratic loss function
(9) = - -l, - ?
l(h>?Q0) (H: ?00)'s Qi ?00)
The use of this loss function is relatively widespread because it leads to tractable
results. It is interesting to study because, in the univariate case, the optimal esti?
mator is the sample mean. Also, a quadratic loss is generally a good local ap?
proximation of a more general loss function expanded in a Taylor series. For
repeated observations, the so-called James-Stein estimator
= -
(10) %s00 (1 ^)Z+ #Y0l,
where w is defined as
-
w = min (N 2)/T
(11) 1,
(y-Y0i)'1,-1(y-Y0i)]'
has uniformlylower risk than the sample mean Y. This estimator is also called a
-
shrinkage estimator, since the sample means are multiplied by a coefficient (1
w) lower than one. Further, the estimator can be shrunk toward any point Y0, and
still have lower risk than the sample mean, but the gains are greater when Y0 is
close to the true value. For negative values of (1 ? w), setting the coefficient
equal to zero leads to an improved estimator: this is the positive part rule.1 Note
that this estimator is biased and nonlinear, since the shrinkage factor is itself a
function of the data.
The superiority of the James-Stein estimator is a startling result. Indeed,
statisticians have been slow to recognize this powerful new statistical technique,
in spite of Lindley's [23] early description of it as "one of the most important
statistical ideas of the decade.' '2
Basically, the result stems from the summation of the components of the
loss function: Stein's estimators achieve uniformly lower risk than the maximum
likelihood estimator, allowing increased risk to some individual components of
the vector jx. As a result, the inadmissibility of the sample mean has been ex?
tended by Brown ([8], [9], and [10]) to other loss functions under surprisingly
weak conditions.
Unfortunately, proof of inadmissibility does not necessarily lead to the con-
struction of better estimators, and the computation ofthe risk function is an ardu-
ous task. Berger [3], however, developed an approach that leads to improved
estimators for polynomial loss functions. For a loss function that is the square of
the usual quadratic loss, he finds that a shrinkage factor ofthe form
(12) d + (Y-
Y0l)rTX-\Y-Y0l)'
The surprising results found by Stein have been given an interesting Baye?
sian interpretation. Consider the following informative conjugate prior for |x
which could also be derived from a random means model. In purely Bayesian
inference, such as in [34], the prior grand mean tj and prior precision \ are as?
sumed known a priori.
Instead, an empirical Bayes approach would let the parameters tj and \ be
derived directly from the data. Therefore, this approach will outperform the clas?
sical sample mean because it relies on a richer model and includes the sample
mean as the special case \ = 0.3 The inadmissibility result found by Stein can be
explained in a Bayesian framework by the fact that the sample mean corresponds
to a diffuse prior, which is improper since it does not integrate to one. In that
case, the Bayes rule need not be admissible.4
As discussed in Section II, optimal portfolio choice should be based on the
predictive density function of the vector of future rates of return r. With the in?
formative prior (13), this predictive density functionp(r | y,2,\), conditional on
2 and \, is multivariate normal, with mean
E\r\ = (1 -w)Y+
(14) wlY0,
v^ = + + ?
^
4 rh) WTTTY) T#TT
For very large values of T, the correction due to estimation risk disappears: the
moments E[r] and V[r] tend to the usual values Y and 2, used in the certainty
equivalence approach. But the richness ofthe empirical Bayes approach is that \
3 EfronandMorris[14] andMorris[28] describethisrationale
infurtherdetail.
4 See, forinstance,[4].
5 Althoughnotdirectly derivedfroma portfolio process,theweightsx minimize
optimization
thevarianceoftheportfolio thattheysumtoone.
subjecttothecondition
was estimated from the data directly. The probability density function
p(X | jx,T|,2) is a gamma distribution with mean at (N + 2)1d, where d is defined
as Qx - Jhrj)'2-1(jx_ - Tn.), and is replaced by its sample estimate (Y ? lY0)f
2~l (Y - IY0). The shrinkage coefficient is then
. = _W + 2_
'
(17) - -
(N + 2) + (y Y0iyTX~l{Y Y0l)
(18) t = ,
j^-2S
where S is the usual unbiased sample covariate matrix. Substitution into (15)
yields V[r].
V. Practical Applications
1) Certainty Equivalence Y, S
2) Bayes Diffuse Prior Y, V[r, X = 0]
3) Minimum Variance (w = 1) l YQ, V[r, \?> oo]
= -
4) Bayes-Stein (w w(y,T)) (1 w)Y + w 1Y0, V\r, \(y)].
Brown [11] has found the second estimator to be generally superior to the first
one. The third estimator, advocated by Jobson et al. [19], is an extreme case of
shrinkage, and has no formal justification in this context because the system is
forced to be stationary. This choice, however, yields a particularly simple portfo?
lio selection rule: for all utility functions, the optimal weights will be those ofthe
minimum variance portfolio.
6 Returnsweregenerated
by theIMSL subroutine
GGNSM. All computations
wereperformed
indouble-precision
FORTRAN.
Risk. =
rMAX ^U=\
Expected utility was also expressed in risk-free equivalent return, as in (7). Fi?
nally, to study the effect of sample size, the previous operations were repeated
for various values of T ranging from 25 to 200.
Figure 2 depicts the empirical risk functions, also reported in Table 2. Sev?
eral features are apparent. First, the Bayes-Stein estimator has always lower risk
than both the certainty equivalence and the Bayes diffuse prior estimators. The
improvement is noticeable and significant. In risk-free equivalent, the gain over
the diffuse prior case ranges from 8 percent (T = 25) to 2 percent (T = 50) to
0.2 percent (T = 200) per annum. The reason behind the superiority of the
Bayes-Stein estimator is that the shrinkage factor w is directly derived from the
data. For small sample sizes, large values of w indicate that portfolio analysis
should not rely too much on the observed dispersion in sample means, given the
large coefficients of variation of stock returns. But, of course, as the sample size
7 The absoluteriskaversionwas chosento be 1/(52.2%p.a.), as in Brown[11]. In annual
deviation)mustbe
instandard
terms,thisimpliesthata 1 percentincreaseinthevariance(10 percent
accompanied byan increaseofabout1 percent inexpectedreturn.
FIGURE 2
EmpiricalRisk Functions
TABLE 2
EmpiricalRisk Functionsand ShrinkageFactors
increases, so does the estimated precision of sample means, and the shrinkage
factor decreases, thus putting less weight on the informative prior relative to the
data.
Next, the minimum-variance estimator performs well for small sample
sizes, but is dominated for higher sample sizes. This is not astonishing: this strat?
egy completely disregards any information contained in the sample means,
which produces very good results for small samples, but is otherwise clearly in-
appropriate. For higher sample sizes, expected returns are more accurately esti?
mated, and utility could be increased by taking into account the expected return
of the portfolio. But this estimator would be particularly robust to nonsta?
tionarity. Finally, comparisons of the certainty equivalence and Bayes diffuse
prior rules confirm the conclusions of Brown's study [11]: the Bayes diffuse prior
uniformlydominates the classical rule.
Sections III and IV indicated that Bayes-Stein estimation should outperform
the sample mean, whatever the true parameter values, and the simulation analy?
sis indicated that the gains were substantial. Surely, these gains must be sensitive
to the choice of the "true" parameter values, but it seems that these results pro?
vide conservative estimates of the gains from the Bayes-Stein estimator. Con?
sider how different the means are in Table 1: expressed on a per annum basis,
they vary from 6 percent to 22 percent. Insofar as this dispersion might be con?
sidered unrealistic, the simulation analysis will be biased against Bayes-Stein
estimation. To take the case even further, the analysis was repeated with the
highest mean changed from 22 percent to 44 percent per annum. The gains from
shrinkage estimation were, on average, cut in half, but the Bayes-Stein estimator
still uniformly dominated the sample mean.
VI. Conclusions
This paper studied the effect of estimation error on portfolio choice. Since
parameter uncertainty implies a loss of investor utility, decision theory should be
based on this loss viewed as a function of the estimator and of the true parameter
values. A fundamental result of statistical theory is that the sample mean is an
inadmissible estimator when the number of parameters is greater than two.
(There exists another estimator that always yields lower expected loss in repeated
samples.) This result stems from the summation of the effect of estimation error
for each component into one single loss measure. Thus, the portfolio context is
central to this result. The issue was also analyzed in an empirical Bayes frame?
work, and this paper presented a simple Bayes-Stein estimator that should im?
prove on the classical sample mean. Next, the extent of gains from Bayes-Stein
estimation was illustrated by simulation analysis. The classical rule was always
outperformed, and the gains were often substantial, in the range of a few percent
per annum in risk-freeequivalent return.
Numerous other applications of this technique are possible in finance. For
instance, extensions to improved estimation of beta coefficients are straightfor-
ward. Also, Jorion [21] evaluated the out-of-sample performance of various esti?
mators, based on actual stock return data, and found that shrinkage estimators
significantly outperform the classical sample mean.
Appendix
Bayes-Stein Estimation
The problem is to find, as in Zellner and Chetty [33], the predictive distribu?
tion of future returns r, conditional on the prior (13), the data y = (y,, . . ., y,),
on the covariance matrix 2 and on the scale factor X
- - -
(A.2) f(h | it,2) B)'X-x(h *)
expFf-i)^
and the density function of |x, given t\and X, is given by the informative prior
Here, the X parameter is a measure of the tightness of the prior; for X tending to
zero, the prior tends to a diffuse prior. The parameter tj represents the unknown
grand mean, which is given a diffuse prior. Instead of an informative prior on a
model with constant means (x, (A.3) could also represent a model where the
means p, vary randomly around a common grand mean.
The predictive density function can then be writtenas
G = (r-ii)'2-,G-l?:) +
2(Zr-*)'2",fc-ii)
t=l
+ (li-T)I)'X2-,(li.-T)I).
After integration over tj and |x, the predictive density can be shown to be normal
with mean vector and covariance matrix as follows
= - + wlYQ,
(A.4) E[r] (1 w)F
w = w = \I(T + X),
shrinkage factor
Y = vector of Y = (1/7) ]?r= i yt*
sample means:
= = x'Y,
Y0 grand mean Y0
x' = x' = V 2 ~V(I'2
weights of min. var. portfolio: _1I).
1 ir
(A.5) V[r] =21 + +
r + x; r(r+ i + x)r2-i1
This covariance matrix has the following interpretation. The firstterm 2 repre?
around the mean |x. The second term 2/(7 + X) is due to
sents the variation of y_t
the uncertainty in the measure of the sample average Y, whereas the third term
corresponds to uncertainty in the common factor.
References