VX Python For Finance EuroScipy 2012 Y Hilpisch
VX Python For Finance EuroScipy 2012 Y Hilpisch
VX Python For Finance EuroScipy 2012 Y Hilpisch
1.1 EXAMPLES 1
1.1.1 SWISS MARKET INDEX 1
1.1.2 CHF/USD EXCHANGE RATE 2
1.1.3 THE GOOGLE STOCK 3
1.2 WHAT IS A TIME SERIES? 4
1.2.1 THE DEFINITION 4
1.2.2 STATIONARITY 4
1.3 SIMPLE RETURNS AND LOG RETURNS 5
1.4 GOALS IN SAFD 6
2 BASIC MODELS 8
4 VOLATILITY MODELS 22
1 Introduction
This course is about the statistical analysis of financial time series. These can,
among other sources, stem from individual stocks’ prices or stock indices, from
foreign exchange rates or interest rates. All these series are subject to random
variation. While this offers opportunities for profit, it also bears a serious risk of
losing capital.
The aim of this document is to present some basics for dealing with financial time
series. We first introduce a statistical notion of financial time series and point out
some of their characteristic properties that require special attention. Later, we
provide several statistical models for financial data, with a focus on how to fit them
and what their implications to everyday practice are. Finally, we lay our attention to
measuring the risk of serious loss with an investment.
1.1 Examples
We start out by presenting some financial data. There are various sources from
which they can be obtained. While some built-in R datasets will be used
throughout this course, others were acquired from non-commercial websites.
> data(EuStockMarkets)
> EuStockMarkets
Time Series:
Start = c(1991, 130)
End = c(1998, 169)
Frequency = 260
DAX SMI CAC FTSE
1991.496 1628.75 1678.1 1772.8 2443.6
1991.500 1613.63 1688.5 1750.5 2460.2
1991.504 1606.51 1678.6 1718.0 2448.2
1991.508 1621.04 1684.1 1708.1 2470.4
1991.512 1618.16 1686.6 1723.1 2484.7
1991.515 1610.61 1671.6 1714.3 2466.8
Page 1
SAFD Introduction
We observe that the series has a trend, i.e. the mean is obviously non-constant
over time. This is typical for financial time series. As we will see, such trends in
financial time series are nearly impossible to predict, and difficult to characterize
mathematically. We will not much embark in this, but try to understand other
important aspects of financial time series.
> library(Ecdat)
> data(Garch)
> dat <- as.Date(as.character(Garch$date), format="%y%m%d")
> chf.usd <- ts(1/Garch$sf)
> plot(dat, chf.usd, type="l", ...)
As can be seen on the next page, the mean of this series is obviously non-
constant over time; hence it is to be considered as a non-stationary time series.
Again, the evolution of the trend will be near-impossible to predict. The best we
can do in that situation is to understand the conditional variability of the series, i.e.
the day-to-day changes in the CHF/USD rates.
Page 2
SAFD Introduction
As easily visible, an investment in the Google stock at IPO paid off very well.
Page 3
SAFD Introduction
An observed time series, on the other hand, is seen as a realization of the random
vector X ( X1, X 2 ,, X n ) , and is denoted with small letters x ( x1, x2 , , xn ) . It is
important to note that in a multivariate sense, a time series is only one single
realization of the n-dimensional random variable X , with its multivariate, n-
dimensional distribution function F . As we all know, we cannot do statistics with a
single observation. As a way out of this situation, we need to impose some
conditions on the joint distribution function F .
1.2.2 Stationarity
The aforementioned condition on the joint distribution F is the concept of
stationarity. In colloquial language this means that the probabilistic character of the
series must not change over time, i.e. that any section of the time series is “typical”
for every other section with the same length. More mathematically, we require that
for any s, t and k , the observations xt ,, xt k could have just as easily occurred at
times s,, s k .
Page 4
SAFD Introduction
Pt Pt 1
Rt .
Pt 1
Rt is called the simple return of the asset with price series Pt . However, in
statistical analysis of financial data, we usually consider log returns rt , which are
defined as:
P
rt log t log( Pt ) log( Pt 1 ) log(1 Rt )
Pt 1
Simple and log returns are not one and the same, although for small relative
changes in the price series Pt , they do not differ much. For example, if Rt 0.00% ,
then rt 0.00% , if Rt 1.00% , then rt 0.995% and if Rt 5.00% , then Rt 4.88% .
Page 5
SAFD Introduction
Limited Liability: for many financial assets, the maximum that one can
lose is the amount that was put into them. I.e., if one holds a stock that is
49$ worth, the most one can lose is 49$. This is also called limited liability.
In that case, the simple return would be Rmin 100% . On the other hand,
the maximum possible simple return is Rmax . This asymmetry does not
exist with log returns: a complete loss yields rmin , and the maximum is
rmax .
Pt Pt 2
R2,t (1 R1,t )(1 R1,t 1 ) 1
Pt 2
Please note that the first subscript is for the horizon of the return
computation, and the second is for time. With log returns, calculating multi-
period returns is much simpler:
i.e. log returns are additive, while simple returns are not.
Another very important reason to use log returns is the compatibility with the
Random Walk model. This will be discussed in section 2, but first, we study goals
and purpose of the statistical analysis of financial data.
The previous returns do not contain much (if any) information about tomorrow’s
expected market movements. Thus, we expect a time- and history-independent
return of . This can be a very small positive value, e.g. due to general economic
growth and/or inflation, but it does not tell us whether it is a good time to invest in
that financial instrument or not.
Page 6
SAFD Introduction
What else remains to do for statisticians in the analysis of financial data? Well,
future log returns rt k are random variables. Their expectation does not depend
(much) on previous returns, but we still need to understand their distribution. That
includes location, scale and shape. The purpose of doing so is specifying which
profits and losses will appear, i.e. to attribute scenarios with probabilities. This will
make for much of the content of chapters 2 and 3 on Random Walk models.
Finally, it will turn out that while there is hardly any difference between the
expectation of the unconditional distribution rt k and its conditional counterpart
rt k | rt 1 , rt 2 ,... , other aspects of these two distributions will not be identical.
Namely, this is the scale parameter. Financial data exhibit periods of high and low
volatility, i.e. the conditional log return distribution can be less or more dispersed,
meaning that previous log returns predict how wide the actual distribution needs
be. Thus, we discuss different volatility models in section 4.
The scope of this document solely encompasses univariate financial time series,
i.e. considers individual assets. When analyzing portfolios with several
instruments, several novel aspects will come into play.
Page 7
SAFD Basic Models
2 Basic Models
In this section, we will present some basic models for the analysis of financial
data. Historically, the Random Walk plays a special role, as in 1905, it was
introduced as the first stochastic model for representing stock prices.
In verbatim: the k -period return at time t is the sum of all single-period returns
back to time t k 1 . If one (conveniently) further assumes that the log returns
follow a Gaussian distribution, we have r1,t ~ N ( , 2 ) . Because sums of
independent normal random variables are themselves normal, it is obvious that
rk ,t ~ N (k , k 2 )
It is now relatively simple to rearrange terms and formulate the random walk model
for the price process. It is:
Pt
exp(r1,t ... r1,t k 1 ) .
Pt k
> set.seed(23)
> lret <- rnorm(1000, 0, 0.03)
> price <- 100*exp(cumsum(lret))
> plot(price, type="l", xlab="Time", main="...")
Page 8
SAFD Basic Models
As an exercise, using many realizations, one can show that the k -period return is
indeed Gaussian with (here) expectation zero and variance 1000 0.032 , or that the
price at time t 1000 follows a lognormal distribution.
Please note that the log return series have one record less than the original price
series. In particular, r1 is not available, since we cannot compare to P0 . Often, it
proves beneficial to enhance the time series plot with a horizontal line indicating
zero. Moreover, because log returns are by construction symmetrical, it makes
Page 9
SAFD Basic Models
sense to force the y -range to be symmetrical around zero. The code for the SMI
log returns is:
SMI Log-Returns
0.05
Log Return
0.00
-0.05
The biggest loss occurred on August 19, 1991, which is the date of the Soviet
August Coup, where a group of communist hardliners tried to take control of the
government from the reform-friendly Mikhail Gorbachev. Next, we display the log
returns of the CHF/USD exchange rate.
Page 10
SAFD Basic Models
In absolute value, the Google returns show more extreme behavior than the index
or the exchange rate. That does not come as a surprise due to the nature of
Google, a single (though big) company operating in the rather volatile ICT
business. Despite some differences, the three plots show some common features
that are very typical for financial data. While perhaps not extremely obvious to a
non-expert, a skilled eye clearly detects:
Clusters of volatility, i.e. periods where log returns are either big or small
Some extreme spikes, i.e. outliers that correspond to very big/small returns
We try to better visualize these points by some dedicated plots. First, the
autocorrelation function (ACF) of the log returns addresses the issue of
uncorrelatedness. Second, the dependency in the conditional variance of the
process can be captured by showing the ACF of the squared log returns. In
particular, whenever volatility clusters do exist, the squared log returns will show
autocorrelation. Finally, histograms and normal quantile-quantile plots serve for
verifying the (Gaussian) distributional assumption.
> acf(lr.google)
> acf(lr.google^2)
> qqnorm(lr.google)
> qqline(lr.google)
> hist(lr.google, freq=FALSE)
Page 11
SAFD Basic Models
1.0
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
25
Sample Quantiles
20
0.05
Density
15
-0.05
10
5
-0.15
We observe that there is hardly any autocorrelation in the log returns. But since
the squared instances show clearly significant ACF estimates, the log returns are
not independent, which is principally due to volatility clustering. Furthermore, the
normal plot clearly shows that the assumption of a Gaussian distribution is off the
mark. The log returns are prominently long-tailed – a property which needs to be
taken into account for proper modeling of financial time series. Hence, the
Gaussian Random Walk cannot be considered as a good model for the type of
financial data that we consider.
Another important issue is stationarity: for the prices, it is clearly rejected, but what
about the log-returns? In this regard, the long-tailed distribution does not bother;
the series’ mean seems constant but what about the variance? We will see later
that the most powerful notion is to regard log returns as stationary, and employ
GARCH type models that allow for dependence in the conditional variance of the
series.
Page 12
SAFD Distributions for Financial Data
3.1.1 Skewness
The third central moment tells us how symmetrical a distribution gathers around its
mean. Rather than working with the third central moment directly, it is, by
convention, standardized. The definition of skewness is as follows:
E[( X )3 ]
Skew .
3
Any random variable with a symmetric distribution will have Skew 0 . Values
greater than zero indicate positive skewness, i.e. distributions that have a heavy
tail on the right hand side. Conversely, Skew 0 indicates a left-skewed
distribution.
1 n xi x
3
Skew
ˆ
n i1 ˆ
Page 13
SAFD Distributions for Financial Data
In R, several extension packages (e.g. e1071, timeDate, TSA) hold functions for
estimating the skewness. We here rely on the one from timeDate and use (the
default) method="moment" for correspondence to the formula given above:
> skewness(lr.google)
[1] 0.4340404
attr(,"method")
[1] "moment"
> skewness(lr.smi)
[1] -0.6316853
attr(,"method")
[1] "moment"
> skewness(lr.fex)
[1] -0.3421189
attr(,"method")
[1] "moment"
The results confirm what is visible in the plots from section 2.2: the Google log
returns are right skewed, the ones of SMI and the CHF/USD exchange rate are
left-skewed – though all of them only moderately so. However, it is important to
keep in mind that the skewness estimator is (and needs to be) very sensitive to
outliers. That is fine as long as the outliers are not “bad” (i.e. wrong) data.
3.1.2 Kurtosis
The kurtosis is the standardized fourth central moment. Similar to the variance it
measures how spread out a distribution is, but it puts more weight on the tails. The
exact definition is:
E[( X ) 4 ]
Kurt
4
It is important to note that the kurtosis is not very meaningful for skewed
distributions, because it will measure both asymmetry and tail weight. Hence, it is
an indicator that is aimed at symmetric distributions. Its minimal value is 1, and is
achieved for any random variable that only takes two distinct values with
probability 1 / 2 . The normal distribution has Kurt 3 ; that value is independent of
the location and scale parameters and 2 . Due to the popularity of the
Gaussian, it is common to compute the excess kurtosis, which is simply:
KurtEx Kurt 3 .
Distributions with heavier tails than the Gaussian, and thus KurtEx 0 are called
leptokurtic. An important example falling into this class is all t -distributions. Their
kurtosis depends on the shape parameter, the degrees of freedom , i.e.:
6
Kurt ( ) 3
4
Page 14
SAFD Distributions for Financial Data
This also shows that the maximum value that the kurtosis can take is . In
financial analysis, an asset with leptokurtic log returns needs to be taken seriously.
It means that big losses (as well as big gains) can occur, and one should be
prepared for it. Estimation of the kurtosis happens by:
ˆ 1 ( xi x )
n 4
Kurt
n i1
> kurtosis(lr.google)
[1] 7.518994
attr(,"method")
[1] "excess"
> kurtosis(lr.smi)
[1] 5.72665
attr(,"method")
[1] "excess"
> kurtosis(lr.fex)
[1] 1.683527
attr(,"method")
[1] "excess"
Again, the estimate is (and needs to be) very sensitive to outliers. That is not
problematic as long as they are correct, but false values can have a big impact.
We observe that Google has the most heavy tailed log returns, while the changes
in the CHF/USD rate show a relatively mild behavior with not much more heavy
tails than the Gaussian.
So far, and usually, normality was/is tested visually, by inspecting time series plots
(or much more powerfully) using the normal plot. Also, the above introduced
measures skewness and kurtosis can help. In some cases, it may be desirable
though to formally test the hypothesis that the data stem from a Gaussian
distribution. There is a battery of tests available for this task. We cannot present all
of these here, but focus on the Jarque-Bera test, that is based on the skewness
and kurtosis estimates.
Page 15
SAFD Distributions for Financial Data
JB
n
24
4 Skew
ˆ 2
ˆ 2 ~ 2
Kurt Ex 2
> jarque.bera.test(lr.google)
data: lr.google
X-squared = 5040.39, df = 2, p-value < 2.2e-16
> jarque.bera.test(lr.smi)
data: lr.smi
X-squared = 2672.383, df = 2, p-value < 2.2e-16
> jarque.bera.test(lr.fex)
data: lr.fex
X-squared = 258.1409, df = 2, p-value < 2.2e-16
Page 16
SAFD Distributions for Financial Data
3.3.1 t-Distributions
To construct t -distributed random variables, we need a Z ~ N (0,1) and a W ~ 2 .
Then, if we take the standardized quotient of the two, the result follows a t -
distribution with degrees of freedom.
Z
T ~ t .
W
The degrees of freedom are a shape parameter. The lower they are, the heavier
tails result. While in classical statistics is a positive integer, it can take any
positive value in financial data analysis. The density function of the t -distribution
is defined as:
(( 1) / 2) 1
ft ( x)
( / 2) (1 ( x / ))( 1)/2
2
The first term is just a normalizing constant, though quite a complicated one. The
symbol () stands for the Gamma function which is defined as:
(t ) x t 1e x dx for t 0 .
0
From a naïve point of view, i.e. just visually, the difference to the Gaussian bell
curve does not seem that big. However, that is deceptive: the variance only exists
if 2 . In that case, it equals / ( 2) . The mean only exists if 1 , and then
takes the value 0. The higher moments require more degrees of freedom for
existence, i.e. for the skewness we need 3 and for the kurtosis 4 .
N(0,1)
t1
t2
0.3
t4
Density
0.2
0.1
0.0
-4 -2 0 2 4
The plot shows that the higher , the closer to the Gaussian the t -distribution is.
In fact, we have convergence t N (0,1) for , which is also apparent from
the probability density function.
Page 17
SAFD Distributions for Financial Data
While it seems as if the t -distributions could be very useful for financial analysis
because we can adapt to the tail behavior of the data, it is, in its pure form, not a
very flexible model. The reason is the absence of a location and/or scale
parameter. It is thus attractive and popular to enhance the definition:
What are the consequences for the moments of such a mixture distribution? Due
to symmetry, the mean is still zero. For the variance of such a random variable M ,
we have a linear combination:
Page 18
SAFD Distributions for Financial Data
-5 0 5
Strikingly, the mixture also has more mass in the center. But since both
distributions have equal variance, there must be a much higher chance for outliers,
i.e. more mass in the extreme tails, too. Indeed, we can compute the ratio of
extreme events by comparing the probability for observations that are further than
three standard deviations away from the mean:
We learn that the mixture distribution produces 10 times more extreme events.
This also translates to the kurtosis of M , which takes the value 16.45. Thus, there
is a lot of mass in the tails. However, the downside of this relatively simple mixture
model is that the large values do not come in sequence, but independently. That is
not fully appropriate for real-world financial time series, where extreme returns
seem to cluster. We can expect a more realistic behavior from the GARCH models
that will be discussed later.
Page 19
SAFD Distributions for Financial Data
The output shows , and . The numbers in parentheses below are standard
errors that were obtained from the maximum likelihood approach. Mathematical
statistics tells us the estimates are asymptotically normal, and thus we can
construct approximate 95% confidence intervals for the parameters by taking the
estimate plus/minus twice the standard error. For evaluating how well the model
fits to the data, we can do a quantile-quantile plot vs. the estimated distribution.
This is more laborious than the normal plot for which the R code already exists.
Page 20
SAFD Distributions for Financial Data
We observe that the fit is really good, especially in the upper tail. On the left hand
side, the smallest true negative returns are not quite as extreme as the model
suggests. But still, we judge the model as adequate.
Example:
> 0.0009455952+0.0133499234*qt(0.05,2.9431358498)
[1] -0.03072064
Thus until tomorrow, with a probability of 95%, we will not lose more than 3.07% of
our money if we invest in the Google stock today. What is the respective value for
a horizon of the next 20 trading days? There is no closed form solution for that,
and we can only resort to computations in R:
> set.seed(23)
> res <- c()
> for (i in 1:100000)
+ {
+ lretr <- rt(20,2.9431358498)
+ lret <- 0.0009455952+0.0133499234*lretr
+ res[i] <- sum(lret)
+ }
What we did is 100’000 runs where for each of the next 20 trading days, a log
return was drawn independently from the respective t -distribution. The continuous
compounding property can then be used to determine the net return over the next
20 trading days. Finally, we just compute the empirical 5%-quantile of these
100’000 results:
It turns out that with a probability of 95%, we will not lose more than 14.54% of our
money if we invest in the Google stock today. With respect to the distribution that
was used, this result is trustworthy. However, it also includes the assumption of
independence, which is clearly violated. There are apparent periods of high and
low volatility, and we could certainly produce a more exact result if we managed to
incorporate these into our model. The volatility models will do so.
Page 21
SAFD Volatility Models
4 Volatility Models
Throughout this document, we have collected quite a bit of empirical evidence that
financial log returns feature volatility clustering: there are periods where the prices
change more substantially, and others where the market is more quiet and the
movements are relatively little. Hence, the magnitude of financial log-returns is
usually serially correlated. The Random Walk model cannot accommodate for
time-varying volatility, neither in its Gaussian formulation nor with heavy-tailed
increments. Hence, there is a need for novel approaches. Here, we will present
the GARCH class of models.
X t t Et .
Hereby, t is the conditional mean of the series, i.e. t E[ X t | X t 1, X t 2 ,...] and Et
is a disturbance term. In traditional time series analysis, e.g. AR ( p ) -modeling, the
disturbance term is usually assumed to be a White Noise innovation, and the
conditional mean is expressed as a function of past observations:
Here, t is still the conditional mean, t is the conditional standard deviation and
Wt is a White Noise innovation. If we assume that t is a function of previous
instances, we obtain a process that also has short-term memory in the variance
and hence can be used for volatility modeling. Because t is a (conditional)
standard deviation, we need to make sure that the function of previous instances
is non-negative. That is cumbersome to achieve with linear combinations, because
coefficients restrictions are always awkward. Instead, it is more popular to work
with non-linear variance function models. Examples include the ARCH/GARCH
class discussed below.
Page 22
SAFD Volatility Models
X t tWt .
Wt is a White Noise innovation process, with mean zero and unit variance. The
two parameters 0 , 1 are the model coefficients. By construction, the conditional
variance of an ARCH (1) process behaves just like an AR (1) model. Later, we will
exploit the ACF of squared log returns for deciding the order of ARCH models.
0
Var ( Et ) .
1 1
Also, the conditional mean and all autocorrelations of ARCH (1) models are zero:
Page 23
SAFD Volatility Models
Thus, an ARCH (1) is completely useless for prediction whether a stocks’ value will
in- or decrease on the following day. It just allows for a more precise
understanding of the width (i.e. the dispersion parameter) of the return distribution,
and thus for locally adaptive risk management, i.e. better control of the exposition
to big losses. If such predictions are accurate, institutional investors are able to
make profit from the size of the expected market movements.
Et tWt ,
again for all t 1,...,1100 . We treat the first 100 observations as a burn-in period
into this simulation due to the arbitrary (but realistic) choice of the initial value
E0 0 . Thus, we discard them, so that we are left with a series that has 1000
instances. The goal was to generate a result that coincides with the properties of
true log returns. The simulation code is:
> set.seed(4659346)
> wn <- rnorm(1101, 0, 1)
> et <- st <- c()
> et[1] <- 0
> for (i in 2:1101)
+ {
+ st[i] <- sqrt(0.0001+0.9*et[i-1]^2)
+ et[i] <- st[i]*wn[i]
+ }
The results for the ARCH (1) simulated log returns Et and the conditional standard
deviation t are displayed in two time series plots, see next page. We observe
that the simulated log returns reflect some of the typical features of real world
financial log returns: there are some outliers, and there is volatility, though it does
not seem to persist very long after a burst, i.e. the market returns to its normal
state in short time. But still, the ARCH (1) process seems like a pretty good
generator for log returns. It is also apparent that the conditional standard deviation
of the process shows some huge fluctuations. From the setup, it is clear that the
minimal value of t is 0.01, but the most extreme value is close to 0.15. On
average, as an estimate of the marginal standard deviation, we have:
> mean(st[102:1101])
[1] 0.01742821
This is a value that well matches with true data, e.g. 0.0216 for Google.
Page 24
SAFD Volatility Models
0.15
0.05
Log Return
-0.15 -0.05
0.15
0.10
Cond. SD
0.05
0.00
0 200 400 600 800 1000
We will now put some more emphasis on the marginal distribution of the ARCH (1)
process Et . From the time series plot, we can guess that it is heavy-tailed, but a
normal plot is more instructive:
-3 -2 -1 0 1 2 3
Theoretical Quantiles
This confirms that despite the use of Gaussian White Noise for the innovation
term, the resulting ARCH (1) is heavy-tailed. After some deeper mathematical
study, this does not come as a surprise: the marginal distribution of Et is a mixture
Page 25
SAFD Volatility Models
of Gaussian distributions. It is not just two components with fixed coefficients that
play a role here, but the mixture is continuous. Nevertheless, what happens is the
very same as in the illustrative example from section 3.3.2: we obtain heavy tails.
We can of course use the ARCH (1) simulated log returns to reconstruct the
associated price process. We assume an initial value of P0 100 and display the
prices over the following 1000 time units:
Certainly, we could not dismiss the price trajectory as being totally unrealistic for
true data. It is eye-catching though that even after some very big losses or profits,
the price adjustments seem to be fairly little. As we will see later, such behavior
can better be captured by ARCH models of higher order, or with the Generalized
ARCH approach from section 4.3.
However, as a next step, we will verify that the theoretically claimed dependency
structure is indeed present on the simulated data and matches what we observed
on true log returns. Therefore, we show ACF/PACF for both the straight and
squared Et :
The result can be seen on the next page and shows that there is hardly any
correlation in the straight Et . Theory says it is exactly nil, the sample estimates of
course are not, but mostly fall within the confidence bounds. The situation is
Page 26
SAFD Volatility Models
different for Et2 : we have an exponential decay in the ACF and a clear cut-off at
lag 1 in the PACF, which is a typical AR(1) structure.
1.0
Partial ACF
0.6
0.6
ACF
0.2
0.2
-0.2
-0.2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
1.0
Partial ACF
0.6
0.6
ACF
0.2
0.2
-0.2
-0.2
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
4.2.3 ARCH(p)
We could now ask the question “what do we do if squared financial log returns
show an ACF that resembles the one from an AR ( p ) with p 1 ?”. In fact, we don’t
know yet, but the answer is very obvious. When fitting an ARCH ( p) , the
conditional variance follows an autoregressive structure with order p . The
definition is:
Again, Wt is a (typically Gaussian) White Noise innovation with zero mean and unit
variance. The process Et has mean zero and no correlation, but shows volatility
and a heavy-tailed marginal distribution, very much like the ARCH (1) process. The
difference is that ARCH ( p) can reflect more complex dependency in the
conditional variance and by including more terms from the past, also achieves
somewhat higher volatility persistence. However, accurate modeling of real-world
data usually requires quite large p and thus, many parameters need to be
estimated. Often, the GARCH ( p, q) models discussed below allow for a more
parsimonious representation. However, we will first shed some light on how to fit
ARCH ( p) models to financial data.
Page 27
SAFD Volatility Models
ACF of Squared SMI Log Returns PACF of Squared SMI Log Returns
1.0
1.0
0.8
0.8
0.6
0.6
Partial ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
For the squared SMI log returns, up to some random variation, the ACF shows a
behavior that is compatible with the exponential decay that theory suggests. The
PACF has significant values at lags 1 and 2 but none thereafter, thus a cut-off
seems plausible. As a conclusion, fitting an ARCH (2) to these data is a valid
choice.
Page 28
SAFD Volatility Models
Model:
GARCH(0,2)
Residuals:
Min 1Q Median 3Q Max
-9.867029 -0.518727 0.006919 0.584446 5.697691
Coefficient(s):
a0 6.326e-05 1.862e-06 33.976 < 2e-16 ***
a1 1.441e-01 2.392e-02 6.021 1.73e-09 ***
a2 1.203e-01 2.200e-02 5.471 4.48e-08 ***
---
Diagnostic Tests:
Box-Ljung test
data: Squared.Residuals
X-squared = 0.183, df = 1, p-value = 0.6688
The summary provides the point estimates for 0 , 1 and 2 , along with
asymptotic standard errors. All coefficients are significantly different from zero, but
from that alone it is not clear that the model provides a good fit. The diagnostic
tests in the second half of the summary aim for answering that latter question.
Both these tests are based on examining the residuals. Remember that the ARCH
residuals are estimates of the White Noise innovation Wt . Thus, they should
neither show dependence, nor volatility nor (here, because garch() works under
the normal assumption) heavy tails.
Volatility in the residuals would be present if the model was not powerful enough to
capture all of what is present in the SMI log returns. The Box-Ljung test evaluates
whether the autocorrelation at lag 1 of the squared residuals is significantly
different from zero. With a p-value of 0.67, this is clearly not the case. However,
the normality of the residuals is strongly rejected. The Jarque-Bera statistic takes a
very large value and hence, the model that we fitted here is not fully appropriate.
We try to illustrate this by showing QQ-plots versus the Gaussian distribution, and
versus a tv ( , 2 ) -distribution with 4.81 degrees of freedom that was fitted to the
ARCH (2) residuals by the maximum likelihood based R function fitdistr() as
discussed in section 3.3.1.
> qqnorm(resid(fit))
> qqline(resid(fit))
> qqt(resid(fit))
Page 29
SAFD Volatility Models
10
10
Sample Quantiles
5
5
sample.q
0
0
-5
-5
-10
-10
-3 -2 -1 0 1 2 3 -4 -2 0 2 4
Theoretical Quantiles th.q
We observe that the residuals are heavy tailed. When doing a QQ-plot versus a
tv ( , 2 ) -distribution with, as it turns out, 4.81 degrees of freedom, things look
better than when using the Gaussian. Nevertheless, there is an outlier that is far
beyond the envelope of even the t -distribution and some left-skewness is also
present. The question is if ARCH models can be adjusted to accommodate for
that. The answer is yes, as will be shown in the following chapters.
Page 30
SAFD Volatility Models
> par(mfrow=c(1,2))
> acf(lr.google^2, ylim=c(-0.1,1), main="...")
> pacf(lr.google^2, ylim=c(-0.1,1), main="...")
1.0
0.8
0.8
0.6
0.6
Partial ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
There is a clear dependence present. However, deciding for the correct model
order is by no means easy here, because a relatively large number of coefficients
significantly differ from zero. What ARMA( p, q) could have generated that ACF
and PACF? As it is hard to give an educated guess from the plots, common sense
says that we should start with a simple model. A reasonable standard choice is a
GARCH (1,1) , for which we repeat the fitting process from section 4.2.4.
Page 31
SAFD Volatility Models
All coefficients are highly significant, suggesting that they are urgently required.
The Box-Ljung test has a p-value of 0.586, indicating that there is no “underfit” of
the GARCH (1,1) . On the other hand, the null hypothesis of Gaussian innovations is
strongly rejected with the Jarque-Bera test. We display ACF and PACF of the
estimated innovation terms, as well as a Normal Plot of the residuals, plus a QQ-
plot where we evaluate against tv ( , 2 ) -distribution with 3.88 degrees of freedom.
1.0
0.8
0.8
0.6
0.6
Partial ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Lag Lag
sample.q
0
0
-5
-5
-3 -2 -1 0 1 2 3 -6 -4 -2 0 2 4 6
Theoretical Quantiles th.q
Page 32
SAFD Volatility Models
Coefficient(s):
mu omega alpha1 beta1 shape
1.1503e-03 2.5905e-06 3.7093e-02 9.5797e-01 3.9148e+00
The output is pretty overwhelming! First of all, we note that garchFit() allows
for simultaneous estimation of , of which we make use here. The result is given
as mu in the output, with a result that is similar (but not equal) to the arithmetic
mean in the Google log returns. As it turns out, also the other coefficient estimates
are (except for 0 , which is termed omega in the output) not very much different
from the first fit with the garch() procedure (that assumes Gaussian innovations).
The shape parameter of the t (, 2 ) -distribution is estimated as 3.92, and thus in
the region of what we had found previously when we fitted a t ( , 2 ) -distribution
to the residuals from a GARCH (1,1) that was fitted under the assumption of
Gaussian innovations.
Page 33
SAFD Volatility Models
Furthermore, a battery of tests is carried out, both on the residuals and squared
residuals. The Box-Ljung tests show that there is neither autocorrelation nor
volatility in the residuals. Thus, the GARCH (1,1) model seems reasonable. Note
that the Jarque-Bera still tests whether the innovations are Gaussian. They are
not, a matter which we already incorporated into the model by specifying
t -distributed innovations. There is no test (in the summary output) which evaluates
the adequacy of the chosen distributional model. What we can do is comparing the
AIC value under usage of the heavy-tailed innovations versus the use of Gaussian
innovations. They are -4.98 and-5.18 for the Gaussian and the heavy-tailed. We
take this as further evidence that the tv ( , 2 ) -innovations yield a better fit.
(rt ) X t t tWt
Here, t is the conditional expectation, which is not constant anymore. That part
of the time dependence in the data is accounted for by an ARMA( p, q) :
p q
t i rt i i Et i , where Et tWt .
i 1 i 1
Since the data are also assumed to have volatility, t is the conditional standard
deviation of the data. It is not constant, and its fluctuations are addressed by a
GARCH ( p, q) model, defined as:
p q
t2 0 i Et2i i t2i
i 1 i 1
The typical assumption for financial log returns is (direct) uncorrelatedness and
constant conditional mean. That is appropriate for all our examples. Just for
illustration, we fit a combined ARMA(1,1) / GARCH (1,1) to the Google log returns.
Page 34
SAFD Volatility Models
The previous page shows the most important section from the lengthy summary
output. It turns out that neither the AR -parameter 1 nor the MA -parameter 1 are
significantly different from zero. That does not come as a surprise, because the
tests on the residuals on the previous page had clearly indicated correlation-free
residuals.
Finally, garchFit() also allows to fit APARCH models. They are based on the
notion that large negative returns sometimes seem to increase volatility more than
large positive returns of the same magnitude do. The issue is solved with
asymmetric power ARCH models, hence the name. We will not discuss these
models here more deeply.
Page 35
SAFD Risk Management
5 Risk Management
One of the principal goals in the statistical analysis of financial data is to
understand the risk that is associated with an investment. We had argued above
that asset prices are non-stationary, so that we must base our considerations on
their log returns. As it turned out, they hardly show dependence, thus it is
generally considered as impossible to predict whether future returns will be
positive or negative. However, there are other important aspects of the return
distribution that need to be understood:
For addressing the first two points, we introduced the Random Walk model which
operates under assuming independent increments that are either Gaussian or
heavy-tailed. In this setup, we can specify the return distribution either analytically
(Gaussian case) or at least using Monte Carlo simulations (heavy-tailed case).
Doing so opens the door to calculate arbitrary loss or profit quantiles.
Unfortunately, true financial data mostly show volatility. Thus, they are not
independent and the Random Walk model can at best be seen as a rough
approximation to the truth. We addressed the issue by introducing the
GARCH ( p, q) -technique which allows for modeling the conditional variance. The
output can be used to derive the return distribution at a specific time t , but it will
change over time. Logically, also loss/profit-quantiles will evolve over time. Here,
we take the opportunity to give some formal definitions of the risk management
terms that are most widely used in financial analysis. Then, we will also discuss
how they need to be implemented in the context of our examples.
It could for example turn out that VaR1;0.95 0.038 , which means that with 95%
probability the 1-day log return will not be below -3.8%. Or in other words, with a
chance of only 5%, we will face a loss that exceeds the VaR1;0.95 , i.e. -3.8%. As the
definition shows, the VaR concept always requires a confidence level (1 ) , and
a time horizon k . It is typical to set (1 ) 0.95 or 0.99 , as well as k 1 , but
longer horizons are also common. The following examples illustrate the concept.
Page 36
SAFD Risk Management
Thus, on our observed data, we did lose more than -3.13% on maximal 5% of the
trading days since the IPO in 2004. While this model-free VaR computation can
work quite well with enough data, the disadvantage is that we cannot account for
volatility, and it is impossible to determine the VaR of a portfolio of assets. These
two cases, which are realistic and important for practice, can only be dealt with
when working with at least the Random Walk, or better, with GARCH models.
Because the Google log returns are clearly long-tailed, we introduced a heavy-
tailed random walk model later in section 3.4. The 1-day 95%-VaR according to
that model turned out to be:
> 0.0133499234*(qt(0.05,2.9431358498)+0.0009455952)
[1] -0.03165361
Thus, the heavy-tailed model is less conservative than the Gaussian Random
Walk. That may seem surprising, but since the t -distribution has more mass in the
tails, it has less in the center. As mentioned above, the VaR for longer horizons
can only be computed using Monte Carlo simulations, because the sum of
independent t (, 2 ) random variables is no longer in the same family. We do not
repeat the code here, the result was -16.41%.
Page 37
SAFD Risk Management
Suppose we have n observations of daily log returns r1,..., rn and want to compute
the 95%-VaR for the next day. For Google, we have n 2106 . The current
conditional standard deviation is n and can be taken from the nth fitted value of
the GARCH model. However, we require the one for the next day, which is ˆ n1|n .
This can be obtained by a (time series) forecast, which is easy to produce from the
garchFit() output. We just apply the predict() command to the fitted output:
> predict(gfit.nd,1)
meanForecast meanError standardDeviation
1 0 0.01474702 0.01474702
Thus, the predicted log return standard deviation for the next trading day is
0.0147. That is comparably low to the marginal standard deviation of:
> sd(lr.google)
[1] 0.02164966
However, a closer look at the data (not shown here) proves that at the end of
2012, we are in a period of low volatility. The strong point of using GARCH models
is that it can adapt to such behavior. The actual VaRt 2106;1;0.95 for the next trading
day is computed as the 5%-quantile of the respective Gaussian distribution:
We predict not to lose more than 2.314% tomorrow with 95% probability. That is
much more optimistic than what we had computed above under independence.
Please note that we here use the estimate from the garchFit() procedure as a
mean for the log return distribution.
Page 38
SAFD Risk Management
The VaRt 2106;k 20;0.95 for a 20-day-horizon according to the Gaussian- GARCH (1,1) is
only -9.39%. This is quite a bit lower than the the -14.07% we had computed from
the Gaussian Random Walk, owing to the fact that we are in a low volatility period.
As we had seen above, there is little credibility for a GARCH (1,1) with Gaussian
innovations. Instead, they seem long-tailed, and we should use that better model
for risk management. The procedure is not too different, we again create a
forecast of ˆ n1|n .
We have to convert that result to a forecast for the conditional scale parameter of
the t (, 2 ) -distribution via:
The degrees of freedom can be taken from the GARCH (1,1) output and are 3.915.
Thus, tomorrows return follows a t3.915 (0.0012,0.009632 ) -distribution. The mean was
again taken from the garchFit() procedure. The 1-day 95%-VaR is:
> 0.0012+0.00963*qt(0.05,3.915)
[1] -0.01945842
The numerical result is -1.95%, which is less conservative than the GARCH (1,1)
results of -2.31% under Gaussian distribution. We made a very similar observation
when comparing the Random Walk results. Again, the reason is that the
leptokurtic distribution has heavier tails and thus more (very) extreme events. In
contrast, this also means that there is more mass in the center of the distribution,
which leads to a less negative 5%-quantile of the distribution.
For computing a multi-period VaR from the GARCH (1,1) model with heavy-tailed
innovations, we again need to run Monte Carlo simulations. The basis is to
determine the (conditional) return distribution, very much like above, for each of
the next 20 trading days. However, since the sum of these distributions is no
longer a t (, 2 ) , there is no way around drawing random variates of the
respective distributions and determine the resulting 20-day-return.
Page 39
SAFD Risk Management
The process is repeated many times, and so the Monte Carlo distribution of the
20-day-return is obtained. Its 5%-quantile is the VaR that we are looking for. The
result lies in the range of -8.1% and is again less conservative than the result
under the assumption of Gaussian innovations.
In words, it is the expected loss given that the return violates the VaR. Of course,
this typically depends on the time t , the horizon k and the level (1 ) . Again, the
most typical case is the 1-day 95%-ES.
> mean(lr.google[lr.google<quantile(lr.google,0.05)])
[1] -0.04978501
As it turns out, 106 of the 2106 log returns are below the empirical 95%-VaR. The
average of these is -4.98% - that is our empirical estimate of the 1-day 95%-ES.
While this is by no means required, one can also use a pre-existing R function
from library(PerformanceAnalytics):
Page 40
SAFD Risk Management
f ( 1 ( ))
ES ,
where 0.05 , f () is the density and 1 () is the quantile function of the
standard normal distribution. For example, when computing the 95%-ES
f ( 1 (0.05))
2.063 .
0.05
Thus, the ES is always a bit more than two standard deviations below the mean.
Please also note that the 95%-VaR is at -1.645 standard deviations below the
mean. Thus, when working with the Gaussian distribution, it does not make a
difference which of the two risk measures is employed. However, that is not the
case for all distributional models. Of course, the mean and the standard
deviation need to be determined accordingly with respect to the the horizon k .
By assuming the Random Walk model, there is no volatility, and hence the ES will
be time-invariant. For the 1-day 95%-ES in case of Google, we obtain:
> mean(lr.google)-sd(lr.google)*dnorm(qnorm(0.05))/0.05
[1] -0.04372968
Again, there is a dedicated R function that does the job. It is again ES(), only the
method="gaussian" needs to be set.
> 20*mean(lr.google)-sqrt(20)*sd(lr.google)*2.063
[1] -0.1811931
The result turns out to be very similar to what we had computed empirically. The
downside of the 20-day empirical computation is that it is based on relatively few
observations, but it is methodologically sound. Here, we have a sound estimate of
more than 2000 observations for estimating mean and standard deviation.
However, the computations are based on assuming a Gaussian Random Walk
and there is plenty of evidence in the data, that this model is not 100% accurate.
Page 41
SAFD Risk Management
f ( F1 ( )) ( F1 ( )) 2
ES
1
Not surprisingly, the multiplier for the scale parameter depends on the degrees of
freedom. The heavier the tails of the distribution are, the further away from the
mean the ES is. The diagram below shows the relation between the multiplier and
the degrees of freedom by assuming 0.05 .
5 10 15 20
Degrees of Freedom
The function converges to 2.063, the multiplier value for the Gaussian. The grey
line corresponds to the multiplier for the 95%-VaR for the respective distribution.
We observe that the ratio between ES and VaR is not constant, but depends on
the shape parameter . Let us now implement the ES computation for the case of
Google. We had estimated 2.943 degrees of freedom for the log returns. Hence:
Under this model, the ES is estimated as -5.16%. This is now more conservative
than for the Gaussian Random Walk, owing to the fact that the heavy tail has more
effect in computing the ES than in computing the VaR. We conclude this section
by noting that for multi-period ES under heavy tails, Monte Carlo simulations are
again required for determining the multi-period log return distribution.
Page 42
SAFD Risk Management
The empirical result was -4.98%, the Gaussian Random Walk yielded -4.37% and
here, we only have -2.93%. The reason is again the fact that at the end of 2012,
we are in a very low volatility period. If computing multi-period ES is the goal, this
works analogously to what we had shown for the VaR in section 5.1.3. A multi-step
prediction of the conditional standard deviation is produced and these are
converted into a scale parameter for the multi-period log return.
The result is -3.00%. As usual, when we want multi-period risk measures from the
heavy-tailed approach, we have to employ Monte Carlo. This works exactly the
same as demonstrated in the VaR chapter, except that we do not determine the
5%-quantile of the Monte Carlo distribution, but instead compute its empirical ES.
The result turns out to be -11.13%.
Page 43