FRM Bionic Turtle T2-Quantitative
FRM Bionic Turtle T2-Quantitative
FRM Bionic Turtle T2-Quantitative
Table of Contents
Stock, Chapter 2: Review of Probability
28
51
60
63
67
71
75
86
97
www.bionicturtle.com
106
Stock, Chapter 2:
Review of Probability
In this chapter
Define random variables, and distinguish between continuous and discrete random
variables.
Define the probability of an event.
Define, calculate, and interpret the mean, standard deviation, and variance of a
random variable.
Define, calculate, and interpret the skewness, and kurtosis of a distribution.
Describe joint, marginal, and conditional probability functions.
Explain the difference between statistical independence and statistical
dependence.
Calculate the mean and variance of sums of random variables.
Describe the key properties of the normal, standard normal, multivariate normal,
Chi-squared, Student t, and
F distributions.
Define and describe random sampling and what is meant by i.i.d.
Define, calculate, and interpret the mean and variance of the sample average.
Describe, interpret, and apply the Law of Large Numbers and the Central Limit
Theorem.
www.bionicturtle.com
Continuous
probability
function
(pdf, pmf)
Pr (c1 Z c2) =
(c2) - (c1)
Pr (Z c)= (c)
Pr (X = 3)
Pr (X 3)
Cumulative
Distribution
Function (CDF)
Discrete
P (a X b) a f ( x )dx
www.bionicturtle.com
P( X xk ) f ( xk )
Examples of a discrete random variable include: coin toss (head or tails, nothing in
between); roll of the dice (1, 2, 3, 4, 5, 6); and did the fund beat the benchmark?(yes,
no). In risk, common discrete random variables are default/no default (0/1) and loss
frequency.
Note the similarity between the summation ( ) under the discrete variable and the
integral () under the continuous variable. The summation () of all discrete outcomes
must equal one. Similarly, the integral () captures the area under the continuous
distribution function. The total area under this curve, from (-) to (), must equal one.
www.bionicturtle.com
Summary
Discrete
Are counted
Finite
Continuous
Are measured
Infinite
Examples in Finance
Distance, Time (e.g.)
Default (1,0) (e.g.)
Severity of loss (e.g.)
Frequency of loss (e.g.)
Asset returns (e.g.)
For example
Normal
Sampling
Students t
Chi-square
distributions
F distribution
Lognormal
Exponential
Gamma, Beta
EVT Distributions (GPD, GEV)
Bernoulli (0/1)
Binomial (series of i.i.d. Bernoullis)
Poisson
Logarithmic
P ( A)
For example, consider a craps roll of two six-sided dice. What is the probability of rolling a
seven; i.e., P[X=7]? There are six outcomes that generate a roll of seven: 1+6, 2+5, 3+4, 4+3, 5+2,
and 6+1. Further, there are 36 total outcomes. Therefore, the probability is 6/36.
In this case, the outcomes need to be mutually exclusive, equally likely, and
cumulatively exhaustive (i.e., all possible outcomes included in total). A key property
of a probability is that the sum of the probabilities for all (discrete) outcomes is 1.0.
www.bionicturtle.com
Roll
1
2
3
4
5
6
Total
Empirical Distribution
Freq.
11
17
18
21
18
15
100
%
11%
17%
18%
21%
18%
15%
100%
But the empirical frequency, based on this sample, is 18%. If we generate another
sample, we will produce a different empirical frequency.
This relates also to sampling variation. The a priori probability is based on population
properties; in this case, the a priori probability of rolling any number is clearly 1/6th.
However, a sample of 100 trials will exhibit sampling variation: the number of threes (3s)
rolled above varies from the parametric probability of 1/6th. We do not expect the
sample to produce 1/6th perfectly for each outcome.
www.bionicturtle.com
y k pk y i pi
E ( X ) y1p1 y 2 p2
i 1
E( X ) xf ( X )dx
Variance
Variance and standard deviation are the second moment measures of dispersion. The variance of
a discrete random variable Y is given by:
k
2
2
Y2 variance(Y ) E Y Y y i Y pi
i 1
Variance is also expressed as the difference between the expected value of X^2 and the square
of the expected value of X. This is the more useful variance formula:
www.bionicturtle.com
91
1
1
1
1
1
1
E [ X 2 ] (12 ) (22 ) (32 ) (42 ) (52 ) (62 )
6
6
6
6
6
6
6
Then, we need to square the expected value of X, [E(X)]2. The expected value of a single six-sided
die is 3.5 (the average outcome). So, the variance of a single six-sided die is given by:
Variance( X ) E ( X 2 ) [E ( X )]2
91
(3.5)2 2.92
6
Here is the same derivation of the variance of a single six-sided die (which has a uniform
distribution) in tabular format:
What is the variance of the total of two six-sided die cast together? It is simply the
Variance (X) plus the Variance (Y) or about 5.83. The reason we can simply add them
together is that they are independent random variables.
Sample Variance:
The unbiased estimate of the sample variance is given by:
sx2
1 k
( y i Y )2
k 1 i 1
www.bionicturtle.com
Properties of variance
1.
2
constant
0
2a. X2 Y X2 Y2
only if independent
2b. X2 Y X2 Y2
only if independent
3.
X2 b X2
4.
2
aX
a 2 X2
5.
2
2
aX
b a X
6.
2
2 2
2 2
aX
b
Y only if independent
bY
X
7.
X2 E ( X 2 ) E ( X )2
Standard deviation:
Standard deviation is given by:
2
Y var(Y ) E Y Y
y i Y 2 pi
sX
1 k
( y i Y )2
k 1 i 1
This is merely the square root of the sample variance. This formula is important because
this is the technically precise way to calculate volatility.
www.bionicturtle.com
Skewness = 3
E [( X )3 ]
For example, the gamma distribution has positive skew (skew > 0):
Gamma Distribution
Positive (Right) Skew
1.20
1.00
0.80
0.60
0.40
0.20
-
alpha=1,
beta=1
0.0
0.6
1.2
1.8
2.4
3.0
3.6
4.2
4.8
alpha=2,
beta=.5
alpha=4,
beta=.25
Kurtosis
Kurtosis measures the degree of peakedness of the distribution, and consequently of
heaviness of the tails. A value of three (3) indicates normal peakedness. The normal
distribution has kurtosis of 3, such that excess kurtosis equals (kurtosis 3).
Kurtosis = 4
E [( X )4 ]
Note that technically skew and kurtosis are not, respectively, equal to the third and fourth
moments; rather they are functions of the third and fourth moments.
www.bionicturtle.com
A normal distribution has relative skewness of zero and kurtosis of three (or the same
idea put another way: excess kurtosis of zero). Relative skewness > 0 indicates positive
skewness (a longer right tail) and relative skewness < 0 indicates negative skewness (a
longer left tail). Kurtosis greater than three (>3), which is the same thing as saying
excess kurtosis > 0, indicates high peaks and fat tails (leptokurtic). Kurtosis less than
three (<3), which is the same thing as saying excess kurtosis < 0, indicates lower peaks.
Kurtosis is a measure of tail weight (heavy, normal, or light-tailed) and peakedness:
kurtosis > 3.0 (or excess kurtosis > 0) implies heavy-tails.
Financial asset returns are typically considered leptokurtic (i.e., heavy or fat- tailed)
For example, the logistic distribution exhibits leptokurtosis (heavy-tails; kurtosis > 3.0):
Logistic Distribution
Heavy-tails (excess kurtosis > 0)
0.50
0.40
0.30
0.20
0.10
-
alpha=0, beta=1
alpha=2, beta=1
alpha=0, beta=3
N(0,1)
1 5 9 13 17 21 25 29 33 37 41
www.bionicturtle.com
Density
Cumulative
Univariate
f(x)= P(X = x)
Bivariate
F(x) = P(X x)
f(x) = P(X x, Y y)
The age of the computer (A), a Bernoulli such that the computer is old (0) or new (1)
Pr(Y y ) Pr X xi ,Y y
Pr( A 1) 0.5
i 1
0
1
Old
New
Tot
Tot
0.35
0.065
0.05
0.025
0.01
0.50
0.45
0.035
0.01
0.005
0.00
0.50
0.80
0.100
0.03
0.030
0.01
1.00
Pr( X y,Y y )
0
1
Old
New
Tot
www.bionicturtle.com
Pr( A 0, M 0) 0.35
0
Tot
0.35
0.065
0.05
0.025
0.01
0.50
0.45
0.035
0.01
0.005
0.00
0.50
0.80
0.100
0.03
0.030
0.01
1.00
Pr(Y y | X x )
Pr( X x,Y y )
Pr( X x )
0
1
Old
New
Tot
Tot
0.35
0.065
0.05
0.025
0.01
0.50
0.45
0.035
0.01
0.005
0.00
0.50
0.80
0.100
0.03
0.030
0.01
1.00
P (B | A )
P( A B)
P ( A)P (B | A) P ( A B )
P ( A)
E(Y | X x )
The conditional variance of Y, conditional on X=x, is given by:
var(Y | X x )
The two-variable regression is a important conditional expectation. In this case, we say
the expected Y is conditional on X:
www.bionicturtle.com
E(Y | X i ) B1 B2 X i
T=$15
T=$20
T=$30
Total
S= $10
0
3
3
6
S= $15
2
4
6
12
S=$20
2
3
3
8
Total
4
10
12
26
P (S $20 T $20)
P (S $20,T $20) 3
P (T $20)
10
In summary:
www.bionicturtle.com
Pr(Y y | X x )
Pr( X x,Y y )
Pr( X x )
This independence implies that the probability of rolling double-sixes is equal to the product
of P(rolling one six) and P(rolling one six). If two die are independent, then P (first roll = 6,
second roll = 6) = P(rolling a six) * P (rolling a six). And, indeed: 1/36 = (1/6)*(1/6)
www.bionicturtle.com
E(a bX cY ) a b X c Y
Variance
In regard to the sum of correlated variables, the variance of correlated variables is given by the
following (note the two expressions; the second merely substitutes the covariance with the
product of correlation and volatilities. Please make sure you are comfortable with this
substitution).
variance( X Y ) X2 Y2
www.bionicturtle.com
0.5
f (x)
0.3
2
1
e ( x )
2
2 2
4.0
3.0
2.0
1.0
0.0
(1.0)
(2.0)
(3.0)
-0.1
(4.0)
0.1
Parsimony: Only requires (is fully described by) two parameters: mean and variance
The central limit theorem (CLT) says that sampling distribution of sample means tends
to be normal (i.e., converges toward a normally shaped distributed) regardless of the
shape of the underlying distribution; this explains much of the popularity of the normal
distribution.
The normal is economical (elegant) because it only requires two parameters (mean
and variance). The standard normal is even more economical: it requires no
parameters.
www.bionicturtle.com
No parameters required!
This unit or standardized variable is normally distributed with zero mean and variance of
one (1.0). Its standard deviation is also one (variance = 1.0 and standard deviation = 1.0). This is
written as: Variable Z is approximately (asymptotically) normally distributed: Z ~ N(0,1)
Memorize two common critical values: 1.65 and 2.33. These correspond to confidence
levels, respectively, of 95% and 99% for a one-tailed test. For VAR, the one-tailed test is
relevant because we are concerned only about losses (left-tail) not gains (right-tail).
www.bionicturtle.com
Chi-squared distribution
Chi-square distribution
40%
30%
k=2
20%
k=5
10%
k = 29
0%
0
10
20
30
For the chi-square distribution, we observe a sample variance and compare to hypothetical
population variance. This variable has a chi-square distribution with (n-1) d.f.:
s2
2
2 (n 1) ~ ( n 1)
Chi-squared distribution is the sum of m squared independent standard normal random
variables. Properties of the chi-squared distribution include:
Nonnegative (>0)
0.0263%
29
0.0200%
38.14
11.93%
88.07%
= 0.0263%/0.02%*29
@ 29 d.f., Pr[.1] = 39.0875
With 29 degrees of freedom (d.f.), 38.14 corresponds to roughly 10% (i.e., to left of 0.10 on the
lookup table). Therefore, we can reject the null with only 88% confidence; i.e., we are likely to
accept the probability that the true variance is 0.02%.
www.bionicturtle.com
Student ts distribution
0.02
20
0.01
Normal
0
0.4
0.8
1.2
1.6
2
2.4
2.8
3.2
3.6
0.00
The students t distribution (t distribution) is among the most commonly used distributions. As
the degrees of freedom (d.f.) increases, the t-distribution converges with the normal
distribution. It is similar to the normal, except it exhibits slightly heavier tails (the lower the d.f..,
the heavier the tails). The students t variable is given by:
X X
Sx n
Its variance = k/(k-2) where k = degrees of freedom. Note, as k increases, the variance
approaches 1.0. Therefore, as k increases, the t-distribution approximates the
standard normal distribution.
Always slightly heavy-tail (kurtosis>3.0) but converges to normal. But the students t is
not considered a really heavy-tailed distribution
In practice, the students t is the mostly commonly used distribution. When we test the
significance of regression coefficients, the central limit thereom (CLT) justifies the
normal distribution (because the coefficients are effectively sample means). But we
rarely know the population variance, such that the students t is the appropriate
distribution.
When the d.f. is large (e.g., sample over ~30), as the students t approximates the
normal, we can use the normal as a proxy. In the assigned Stock & Watson, the sample
sizes are large (e.g., 420 students), so they tend to use the normal.
www.bionicturtle.com
0.02%
1.54%
10
Confidence
Significance (1-)
95%
5%
Critical t
2.262
Lower limit
Upper limit
-1.08%
1.12%
The sample mean is a random variable. If we know the population variance, we assume the
sample mean is normally distributed. But if we do not know the population variance (typically
the case!), the sample mean is a random variable following a students t distribution.
In the Google example above, we can use this to construct a confidence (random) interval:
X t
s
n
We need the critical (lookup) t value. The critical t value is a function of:
The 95% confidence interval can be computed. The upper limit is given by:
X (2.262)
1.54%
1.12%
10
X (2.262)
1.54%
1.08%
10
Please make sure you can take a sample standard deviation, compute the critical t value
and construct the confidence interval.
www.bionicturtle.com
Both the normal (Z) and students t (t) distribution characterize the sampling distribution of
the sample mean. The difference is that te normal is used when we know the population
variance; the students t is used when we mus rely on the sample variance. In practice, we dont
know the population variance, so the students t is typically appropriate.
X
X
X
X
SX
n
F-Distribution
F distribution
10%
8%
6%
4%
2%
0%
19,19
9,9
0.1 0.4 0.7 1.0 1.3 1.6 1.9 2.2
The F distribution is also called the variance ratio distribution (it may be helpful to think of it as
the variance ratio!). The F ratio is the ratio of sample variances, with the greater sample variance
in the numerator:
s x2
F 2
sy
Properties of F distribution:
Nonnegative (>0)
Skewed right
The square of t-distributed r.v. with k d.f. has an F distribution with 1,k d.f.
m * F(m,n)=2
www.bionicturtle.com
=VAR()
=COUNT()
F ratio
Confidence
Significance
=FINV()
GOOG
0.0237%
10
2.82
90%
10%
2.44
YHOO
0.0084%
10
At 10% significance, with (10-1) and (10-1) degrees of freedom, the critical F value is 2.44.
Because our F ratio of 2.82 is greater than (>) 2.44, we reject the null (i.e., that the population
variances are the same). We conclude the population variances are different.
Moments of a distribution
The k-th moment about the mean () is given by:
( x )k
i 1 i
k-th moment
n
In this way, the difference of each data point from the mean is raised to a power (k=1, k=2, k=3,
and k=4). There are the four moments of the distribution:
If k=2, refers to the second moment about the mean: the variance.
If k=4, refers to the fourth moment about the mean: tail density and peakedness.
www.bionicturtle.com
Independent
Identical
Same Mean,
Same Variance
Homo-skedastic
Each random variable has the same (identical) probability distribution (PDF/PMF, CDF)
distribution
Define, calculate, and interpret the mean and variance of the sample
average.
The sample mean is given by:
1 n
E (Y ) E (Yi ) Y
n i 1
The variance of the sample mean is given by:
variance(Y )
www.bionicturtle.com
Y2
n
Std Dev(Y ) Y
Y
n
E(Y ) Y Y
This formula says, we expect the average of our sample will equal the average of the
population. (over-bar signifies sample, Greek mu signifies the mean (average).
E [(Y Y ) ]
2
2
Y
Y2
n
This says, The variance of the sample mean is equal to the population variance divided by the
sample size. For example, the (population) variance of a single six-sided die is 2.92. If we roll
three die (i.e., sampling with replacement), then the variance of the sampling distribution =
(2.92 3) = 0.97.
If the population is size (N), if the sample size n N, and if sampling is conducted without
replacement, then the variance of the sampling distribution of means is given by:
2
Y
Y2 N n
n N 1
se
Y2
n
www.bionicturtle.com
Y
n
If the population is distributed with mean and variance 2 but the distribution is not a
normal distribution, then the standardized variable given by Z below is asymptotically
normal; i.e., as (n) approaches infinity () the distribution becomes normal.
Y Y ~ N(0,1)
Y
se
n
The denominator is the standard error: which is simply the name for the standard
deviation of sampling distribution.
Describe, interpret, and apply the Law of Large Numbers and the Central
Limit Theorem.
In brief:
Law of large numbers: under general conditions, the sample mean () will be near the
population mean.
Central limit theorem (CLT): As the sample size increases, regardless of the underlying
distribution, the sampling distributions approximates (tends toward) normal
www.bionicturtle.com
Not Normal!
(individually)
Each sample has a sample mean. There are many sample means. The sample means have
variation: a sampling distribution. The central limit theorem (CLT) says the sampling
distribution of sample means is asymptotically normal.
We assume a population with a known mean and finite variance, but not necessarily a
normal distribution.
The distribution of the sample mean computed from samples (where each sample equals
size n) will be approximately (asymptotically) normal.
X1 X 2
n
www.bionicturtle.com
Xn
Stock, Chapter 2:
Review of Statistics
In this chapter
An estimate is calculated from the sample (a.k.a., a sample statistic). For example, sample
mean, sample variance, sample skew, sample kurtosis.
In addition to the estimate itself (e.g., sample mean), we estimate the sampling error or
sampling variation.
Next, we conduct hypothesis testing by either: (i) confidence interval, (ii) test of
significance, or (iii) p value.
www.bionicturtle.com
Statistical inference is the process of inferring facts about a population (i.e., the entire group)
based on an examination of a sample (i.e., a small part of the population). The process of
obtaining samples, and therefore sample estimators or statistics, is called sampling.
Confidence
interval
p value
Test of
significanc
e
Population parameters
A population is considered known or understood when we know the probability distribution
function. If X is normally distributed, we say that the population is normally distributed (or, that
we have a normal population). If X is binomially distributed, we say that the population is
binomially distributed (or, that we have a binomial population.)
Population
Parameters
Sample
Statistic
The population is the entire group under study. The population is often unknowable.
The population size is denoted by a capital N.
The population (of which there is typically one) has parameters; e.g., the population
mean or the population variance. A parameter is a quantity in the f(x) distribution
such as mean, or standard deviation or (p) in the case of the binomial distributionthat
helps describe the distribution. Quantities that appear in f(x), such as the mean () and
the standard deviation () are called population parameters.
The sample is a subset of the population. For practical purposes, we draw a sample
(from the population) in order to make inferences about the population. The sample size
is denoted with small n
From the sample (of which there are many) we calculate estimates from estimators or
statistics; e.g., the sample mean or the sample variance. Estimators (statistics) are the
recipes for the best guesses about the true population parameters. Estimators
(statistics) versus parameters
www.bionicturtle.com
In the context of linear regression, the parameters are the slope and intercept associated with
the population regression function (PRF); i.e., the true slope and true intercept. The
estimators are the formulas that produce the estimate slope and intercept coefficients associated
with the sample regression function (SRF). In short, we estimate slope and intercept (the
estimates) in the sample regression function, hoping to infer the true, unobserved population
slope and intercept (the parameters).
Sample Mean
Sample Standard Deviation
Sample size (n)
Standard Error
H0: Population Mean =
Test t statistic
p value
$22.64
$18.14
200
1.28
$20.00
2.06
4.09%
Please note:
The standard error of the sample mean is $1.28 because $18.14/SQRT(200) = $1.28
www.bionicturtle.com
Yi m
i 1
i 1
Y Y ,0
SE (Y )
The critical t-value or lookup t-value is the t-value for which the test just rejects the null
hypothesis at a given significance level. For example:
The critical t-values bound a region within the students distribution that is a specific
percentage (90%? 95%? 99%?) of the total area under the students t distribution curve. The
students t distribution with (n-1) degrees of freedom (d.f.) has a confidence interval given by:
Y t
SY
S
Y Y t Y
n
n
www.bionicturtle.com
d.f.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1-tail: 0.25
2-tail: 0.50
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.1
0.2
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
0.05
0.1
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
0.025
0.05
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
0.01
0.02
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
0.005
0.01
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
0.001
0.002
318.309
22.327
10.215
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
The green shaded area represents values less than three (< 3.0). Think of it as the sweet
spot. For confidences less than 99% and d.f. > 13, the critical t is always less than 3.0. So, for
example, a computed t of 7 or 13 will generally be significant. Keep this in mind because in
many cases, you do not need to refer to the lookup table if the computed t is large; you can
simply reject the null.
www.bionicturtle.com
Sample Mean
$22.64
Sample Std Deviation $18.14
Sample size (n)
200
Standard Error
1.28
Confidence
95%
Critical t
1.972
Lower limit
$20.11 95% CI for Y Y 1.96SE Y
Upper limit
$25.17 22.64 1.28 1.972
Mean
Variance
Std Dev
Count
d.f.
Confidence (1-)
Significance ()
Critical t
Standard error
Lower limit
Upper limit
Hypothesis
t value
p value
Reject null with
23.25
90.13
9.49
28
27
95%
5%
2.052
1.794
19.6
26.9
18.5
2.65
1.3%
98.7%
= 23.25 - (2.052)*(1.794)
= 23.25 + (2.052)*(1.794)
The confidence coefficient is selected by the user; e.g., 95% (0.95) or 99% (0.99).
The significance = 1 confidence coefficient.
www.bionicturtle.com
Determine degrees of freedom (d.f.). d.f. = sample size 1. In this case, 28 1 = 27 d.f.
We are constructing an interval, so we need the critical t value for 5% significance with
two-tails.
The critical t value is equal to 2.052. Thats the value with 27 d.f. and either 2.5% onetailed significance or 5% two-tailed significance (see how they are the same provided the
distribution is symmetrical?)
The standard error is equal to the sample standard deviation divided by the square root
of the sample size (not d.f.!). In this case, 9.49/SQRT(28) 1.794.
The lower limit of the confidence interval is given by: the sample mean minus the
critical t (2.052) multiplied by the standard error (9.49/SQRT[28]).
The upper limit of the confidence interval is given by: the sample mean plus the
critical t (2.052) multiplied by the standard error (9.49/SQRT[28]).
Sx
S
X X t x
n
n
9.49
9.49
23.25 2.052
X 23.25 2.052
28
28
X t
This confidence interval is a random interval. Why? Because it will vary randomly with
each sample, whereas we assume the population mean is static.
We dont say the probability is 95% that the true population mean lies within
this interval. That implies the true mean is variable. Instead, we say the
probability is 95% that the random interval contains the true mean. See how the
population mean is trusted to be static and the interval varies?
www.bionicturtle.com
An estimate is the numerical value of the estimator when it is actually computed using
data from a specific sample.
Linearity: estimator is a linear function of sample observations. For example, the sample
mean is a linear function of the observations.
Unbiasedness: the average or expected value of the estimator is equal to the true value
of the parameter.
Minimum variance: the variance of the estimator is smaller than any competing
estimator. Note: an estimator can have minimum variance yet be biased.
Efficiency: Among the set of unbiased estimators, the estimator with the minimum
variance is the efcient estimator (i.e., it has the smallest variance among unbiased
estimators)
Best linear estimator (BLUE): the estimate that combines three properties: (i) linear,
(ii) unbiased, and (iii) minimum variance
E Y Y
Otherwise the estimator is biased.
If the expected value of the estimator is the population parameter, the estimator is
unbiased. If, in repeated applications of a method the mean value of the estimators
coincides with the true parameter value, that estimator is called an unbiased estimator.
Unbiasedness is a repeated sampling property: if we draw several samples of size (n)
from a population and compute the unbiased sample statistic for each sample, the
average of will tend to approach (converge on) the population parameter.
www.bionicturtle.com
Efficient
Consistent
Unbiased
Smallest variance
variance Y variance Y
www.bionicturtle.com
p
Y
Y
www.bionicturtle.com
Define and interpret the null hypothesis and the alternative hypothesis
Please not the null must contain the equal sign (=):
Define & interpret the null
hypothesis and the alternative
H0 : E (Y ) Y ,0
H1 : E (Y ) Y ,0
H0 : E (Y ) $20
H1 : E (Y ) $20
www.bionicturtle.com
H0 : E (Y ) Y ,0
H1 : E (Y ) Y ,0
Specifically, The one-sided null hypothesis is that the
population average wage is less than or equal to $20.00:
H0 : E (Y ) $20
H1 : E (Y ) $20
The null hypothesis always includes the equal sign (=), regardless! The null cannot include
only less than (<) or greater than (>).
www.bionicturtle.com
Y 1.96SE Y
Y 2.58SE Y
www.bionicturtle.com
Reject H0 at 90% if t
act
1.64
act
2.58
www.bionicturtle.com
www.bionicturtle.com
Our example was a two-tailed test, but recall we have three possible tests:
The parameter is greater than (>) the stated value (right-tailed test), or
The parameter is less than (<) the stated value (left-tailed test), or
The parameter is either greater than or less than () the stated value (two-tailed test).
Small p-values provide evidence for rejecting the null hypothesis in favor of the alternative
hypothesis, and large p values provide evidence for not rejecting the null hypothesis in favor of
the alternative hypothesis.
Keep in mind a subtle point about the p-value and rejecting the null. It is a soft rejection.
Rather than accept the alternative, we fail to reject the null. Further, if we reject the null, we are
merely rejecting the null in favor of the alternative.
The analogy is to a jury verdict. The jury does not return a verict of innocent; rather they
return a verdict of not guilty.
www.bionicturtle.com
sY2
2
1 n
Yi Y
n 1 i 1
SE (Y ) Y
sY
2
1 n
Yi Y
n 1 i 1
sY
n
Y 1.96SE Y
Y 2.58SE Y
Perform and interpret hypothesis tests for the difference between two
means
Test statistic for comparing two means:
Ym Yw d0
SE Ym Yw
Define, describe, apply, and interpret the t-statistic when the sample size
is small.
If the sample size is small, t-statistic has a Students t distribution with (n-1) degrees of freedom
Y Y ,0
sY2 n
www.bionicturtle.com
Interpret scatterplots.
The scattergram is a plot of the dependent variable (on the Y axis) against the independent
(explanatory) variable (on the X axis). In Stock and Waton, the explanatory variable is the
student-teacher ratio (STR). The dependent variable is test score:
720.0
700.0
680.0
Test
660.0
Scores
640.0
620.0
600.0
10.0
15.0
20.0
Student-teacher ratio
25.0
30.0
s XY
1 n
X i X Yi Y
n 1 i 1
Sample correlation is sample covariance divided by the product of sample standard deviations:
r XY
s XY
S X SY
www.bionicturtle.com
(X-X
3
2
4
Avg = 3
5
4
6
Avg = 5
0.0
1.0
1.0
Avg = = 0.67
StdDev = SQRT(.67)
StdDev = SQRT(.67)
Correl. = 1.0
avg
)(Y-Y
avg
XY
Please note:
Properties of covariance
X2 Y X2 Y2 2 XY
X2 Y X2 Y2 2 XY
Note that a variables covariance with itself is its variance. Keeping this in mind, we
realize that the diagonal in a covariance matrix is populated with variances.
www.bionicturtle.com
Correlation Coefficient
The correlation coefficient is the covariance (X,Y) divided by the product of the each variables
standard deviation. The correlation coefficient translates covariance into a unitless metric
that runs from -1.0 to +1.0:
XY
cov( X ,Y )
XY StandardDev( X ) StandardDev(Y )
XY XY
Memorize this relationship between the covariance, the correlation coefficient, and the
standard deviations. It has high testability.
On the next page we illustrate the application of the variance theorems and the correlation
coefficient.
Please walk through this example so you understand the calculations.
The example refers to two products, Coke (X) and Pepsi (Y).
We (somehow) can generate growth projections for both products. For both Coke (X) and Pepsi
(Y), we have three scenarios (bad, medium, and good). Probabilities are assigned to each
growth scenario.
In regard to Coke:
In regard to Pepsi,
Finally, we know these outcomes are not independent. We want to calculate the correlation
coefficient.
www.bionicturtle.com
20%
3
5
60%
9
7
20%
12
9
pX
pY
0.6
1.0
5.4
4.2
2.4
1.8
E(X)
E(Y)
8.4
7.0
XY
pXY
E(XY)
15
3
62.4
Coke (X)
Pepsi (Y)
108
21.6
E(XY)-E(X)E(Y)
3.6
X2
Y2
9
25
81
49
144
81
pX2
pY2
1.8
5
48.6
29.4
28.8
16.2
E(X2)
E(Y2)
79.2
50.6
VAR(X)
VAR(Y)
8.64
1.60
STDEVP(X)
STDEVP(Y)
E[X^2] [E(X)]^2
E[Y^2] [E(Y)]^2
2.939
1.265
COV/(STD)(STD) 0.9682
The calculation of expected values is required: E(X), E(Y), E(XY), E(X2) and E(Y2). Make sure you
can replicate the following two steps:
The correlation coefficient () is equal to the Cov(X,Y) divided by the product of the
standard deviations: XY = 97% = 3.6 (2.94 1.26)
www.bionicturtle.com
Zero covariance zero correlation (But the converse not necessarily true. For example,
Y=X^2 is nonlinear )
Correlation (or dependence) is not causation. For example, in a basket credit default
swap, the correlation (dependence) between the obligors is a key input. But we do not
assume there is mutual causation (e.g., that one default causes another). Rather, more
likely, different obligors are similarly sensitive to economic conditions. So, economic
deterioration may the the external cause that all obligors have in common.
Consqequently, their default exhibit dependence. But the causation is not internal.
Further, note that (linear) correlation is a special case of dependence. Dependence is
more general and includes non-linear relationships.
Sample mean
Sample mean is the sum of observations divided by the number of observations:
n
Xi
X i 1
Variance
A population variance is given by:
x2
1 n
( X i X )2
n i 1
sx2
1 n
( X i X )2
n 1 i 1
www.bionicturtle.com
Covariance
Covariance is the average cross-product:
XY
1
( X i X )(Yi Y )
n
sample XY
1
( X i X )(Yi Y )
n 1
Correlation coefficient
Correlation coefficient is given by:
XY
cov( X ,Y )
XY StdDe v( X ) StdDev(Y )
sample
sample XY
S X SY
Skewness
Skewness is given by:
E [( X )3 ]
Skewness = 3
3
3
( X X )3
Sample Skewness = 3
(N 1)
S3
Kurtosis
Kurtosis is given by:
Kurtosis = 4
www.bionicturtle.com
E [( X )4 ]
4
4
( X X )4
Sample Kurtosis =
(N 1)
Stock, Chapter 4:
Linear Regression
with one regressor
In this chapter
What is Econometrics?
Econometrics is a social science that applies tools (economic theory, mathematics and statistical
inference) to the analysis of economic phenomena. Econometrics consists of the application of
mathematical statistics to economic data to lend empirical support to the models constructed
by mathematical economics.
Methodology of econometrics
Specify the (pure) mathematical model: a linear function with parameters (but without
an error term)
Estimate the parameters of the chosen econometric model: we are likely to use ordinary
least squares (OLS) approach to estimate parameters
www.bionicturtle.com
Create theory
(hypothesis)
Estimate
parameters
Collect data
Test model
specification
Specify
mathematical model
Test hypothesis
Specify statistical
(econometric) model
Use model to
predict or forecast
Note:
The difference between the mathematical and statistical model is the random error
term (u in the econometric equation below). The statistical (or empirical) econometric
model adds the random error term (u):
Yi B0 B1X i ui
www.bionicturtle.com
Pooled (combination of time series and cross-sectional) - returns over time for a
combination of assets; and
Panel data (a.k.a., longitudinal or micropanel) data is a special type of pooled data in
which the cross-sectional unit (e.g., family, company) is surveyed over time.
For example, we often characterize a portfolio with a matrix. In such a matrix, the assets are
given in the rows and the period returns (e.g., days/months/years) are given in the columns:
Time
2006
2007
2008
Asset #1
Time Series
Cross-sectional
Pooled
Average return
across assets on a
given day
Example Returns
for a
business/family
on a given day
www.bionicturtle.com
TestScore
ClassSize ClassSize
Yi
ui
Dependent
(regressand)
Variable
0
0 1X i
other factors
Independent
(regressor)
Variable
Yi
www.bionicturtle.com
0 1Xi
ui
Define and interpret the stochastic error term (or noise component).
The error term contains all the other factors aside from (X) that determine the value of the
dependent variable (Y) for a specific observation.
Yi
0 1Xi
ui
The stochastic error term is a random variable. Its value cannot be a priori determined.
Yi B0 B1X i ui
Yi b0 b1X i
Yi b0 b1X i ei
Each sample produces its own scatterplot. Through this sample scatterplot, we can plot a sample
regression line (SRL). The sample regression function (SRF) characterizes this line; the SRF is
analogous to the PRF, but for each sample.
Note the correspondence between error term and the residual. As we specify the model,
we ex ante anticipate an error; after we analyze the observations, we ex post observe
residuals.
www.bionicturtle.com
Unlike the PRF which is presumed to be stable (unobserved), the SRF varies with each
sample. So, we expect to get different SRF. There is no single correct SRF!
40
30
Sample #2
20
10
0
0
200
400
E(Y ) B0 B12 X i
E(Y ) B0 B1X i2
www.bionicturtle.com
Define and interpret the explained sum of squares, the total sum of
squares, and the residual sum of squares
We can break the regression equation into three parts:
The explained sum of squares (ESS) is the squared distance between the predicted Y and the
mean of Y:
n
ESS (Yi Y )2
i 1
www.bionicturtle.com
The sum of squared residuals (SSR) is the summation of each squared deviation between
the observed (actual) Y and the predicted Y:
n
SSR (Yi Yi )2
i 1
The sum of squared residual (SSR) is the square of the error term. It is directly related to the
standard error of the regression (SER):
n
SSR (Yi Yi )2
i 1
ui2 SER 2 (n 2)
i 1
Equivalently:
ei2
n2
ei2
n2
ei2
SSR
SER
n2
n2
Note the use of the use of (n-2) instead of (n) in the denominator. Division by this smaller
numberin this case (n-2) instead of (n)is referred to as an unbiased estimate.
(n-2) is used because the two-variable regression has (n-2) degrees of freedom (d.f.).
In order to compute the slope and intercept estimates, two independent observations are
consumed.
If k = the number of explanatory variables plus the intercept (e.g., 2 if one explanatory
variable; 3 if two explanatory variables), then SER = SQRT[SSR/(n-k)].
If k = the number of slope coefficients (excluding the intercept), then similarly, SER =
SQRT[SSR/(n-k -1)]
www.bionicturtle.com
Test Scores
700.0
680.0
660.0
640.0
620.0
600.0
10.0
15.0
20.0
25.0
30.0
Student-teacher ratio
The regression function, with standard error, is given by:
TestScore
698.9
(9.47)
2.28 STR
(0.48)
Regression coefficients
Standard errors, SE()
R^2, SER
F, d.f.
ESS, RSS
B(1)
-2.28
0.48
0.05
22.58
7,794
B(0)
698.93
9.47
18.58
418.00
144,315
Please note:
Both the slope and intercept are both significant at 95%, at least. The test statistics are
73.8 for the slope (699/9.47) and 4.75 (2.28/0.48). For example, given the very high test
statistic for the slope, its p-value is approximately zero.
www.bionicturtle.com
Stock, Chapter 5:
Single Regression:
Hypothesis Tests
In this chapter
In the example from Stock and Watson, lower limit = 680.4 = 698.9 9.47 1.96
www.bionicturtle.com
Intercept
Coefficient
698.9
SE
9.47
Slope (B1)
-2.28
0.48
Confidence Interval
Lower
Upper
680.4
717.5
-3.2
-1.3
TestScore
698.9
(9.47)
2.28 STR
(0.48)
Since 1.96 is the critical value associated with two-tailed 95% confidence, the 95%
confidence interval is given by:
95% CE = 1 1.96SE( 1)
B0
STR:
Lower limit
Upper limit
Lower limit
Upper limit
~ tn 2
se(b1)
se(regression coefficient)
test statistic t
www.bionicturtle.com
b1
se(b1)
This has a student's distribution with n - 2 degrees of freedom because there are two coefficients
(slope and intercept).
Using the same example:
TestScore
STR:
t statistic
698.9
(9.47)
2.28 STR
(0.48)
p value 2 Tail ~ 0 %
www.bionicturtle.com
Stock: Chapter 6:
Linear Regression
with Multiple
Regressors
In this chapter
Define, interpret, and discuss methods for addressing omitted variable bias.
Distinguish between simple and multiple regression.
Define and interpret the slope coefficient in a multiple regression.
Describe homoskedasticity and heterosckedasticity in a multiple regression.
Describe and discuss the OLS estimator in a multiple regression.
Define, calculate, and interpret measures of fit in multiple regression.
Explain the assumptions of the multiple linear regression model.
Explain the concept of imperfect and perfect multicollinearity and their
implications.
Yi 0 1X1i ui
Yi 0 1X1i 2 X2i
www.bionicturtle.com
k X ki ui , i 1,..., n
Yi 0 1X1i 2 X2i
k X ki ui , i 1,..., n
TestScore
686
www.bionicturtle.com
SER
SSR
n k 1
Where (k) is the number of slope coefficients; e.g., in this case of a two variable regression, k = 1.
For the standard error of the regression (SER), the denominator is n [# of variables], or
n [# of coefficients including the interect].
R2
ESS
SSR
1
TSS
TSS
Adjusted R^2
The unadjusted R^2 will tend to increase as additional independent variables are added.
However, this does not necessarily reflect a better fitted model.
The adjusted R^2 is a modified version of the R^2 that does not necessarily increase with a new
independent variable is added. Adjusted R^2 is given by:
su2
n 1 SSR
R 1
1 2
n k 1 TSS
sY
2
www.bionicturtle.com
However, imperfect multicollinearity does mean that one or more of the regression
coefficients could be estimated imprecisely
www.bionicturtle.com
Stock, Chapter 7:
Hypothesis Tests
and Confidence
Intervals in
Multiple Regression
In this chapter
Construct, perform, and interpret hypothesis tests and confidence intervals for a
single coefficient in a multiple regression.
Construct, perform, and interpret hypothesis tests and confidence intervals for
multiple coefficients in a multiple regression.
Define and interpret the F-statistic.
Define, calculate, and interpret the homoskedasticity-only F-statistic.
Describe and interpret tests of single restrictions involving multiple coefficients.
Define and interpret confidence sets for multiple coefficients.
Define and discuss omitted variable bias in multiple regressions.
Interpret the R2 and adjusted-R2 in a multiple regression.
www.bionicturtle.com
TestScore
STR:
686
(7.41)
t statistic
p value 2 Tail
= 0.40%
PctEL t statistic
p value 2 Tail
~ 0.0%
Lower limit
Upper limit
Upper limit
STR:
2
1 2 t 1,t 2
www.bionicturtle.com
SSRrestricted SSRunrestricted q
SSRunrestricted n kunrestricted 1
9
8
7
6
5
Coefficient on
4
Expn (B2)
3
2
1
0
-1 -2
-1.5
-1
-0.5
0.5
1.5
www.bionicturtle.com
is statistically significant
2. A high R^2 or adjusted R^2 does not mean the regressors are a true cause of the
dependent variable
3. A high R^2 or adjusted R^2 does not mean there is no omitted variable bias
4. A high R^2 or adjusted R^2 does not necessarily mean you have the most appropriate set
of regressors, nor does a low R^2 or adjusted R^2 necessarily mean you have an
inappropriate set of regressors
www.bionicturtle.com
Fabozzi, Chapter 2:
Discrete
Probability
Distributions
In this chapter
Describe the key properties of the Bernoulli distribution, Binomial distribution, and
Poisson distribution, and identify common occurrences of each distribution.
Identify the distribution functions of Binomial and Poisson distributions for various
parameter values.
1 if C defaults in I
X
else
0
www.bionicturtle.com
Binomial
A binomial distributed random variable is the sum of (n) independent and identically distributed
(i.i.d.) Bernoulli-distributed random variables. The probability of observing (k) successes is
given by:
n
n
n!
P (Y k ) pk (1 p )n k ,
k
k (n k )! k !
Poisson
The Poisson distribution depends upon only one parameter, lambda , and can be interpreted as
an approximation to the binomial distribution. A Poisson-distributed random variable is usually
used to describe the random number of events occurring over a certain time interval. The
lambda parameter () indicates the rate of occurrence of the random events; i.e., it tells us
how many events occur on average per unit of time.
In the Poisson distribution, the random number of events that occur during an interval of time,
(e.g., losses/ year, failures/ day) is given by:
P (N k )
k
k!
Normal Binomial
np
Mean
Variance
2
Standard Dev.
2 npq
npq
Poisson
In Poisson, lambda is both the expected value (the mean) and the variance!
www.bionicturtle.com
Bernoulli
Binomial
Poisson
Default (0/1)
Basket of
Credits; Basket
CDS
BET
Operational
loss frequency
Poisson
binomial
normal
6.0%
4.0%
2.0%
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
0.0%
www.bionicturtle.com
p=20%
p=50%
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
55.0
60.0
65.0
70.0
75.0
80.0
p=80%
lambda = 5
10.0%
lambda = 10
5.0%
lambda = 20
6.0
12.0
18.0
24.0
30.0
36.0
42.0
48.0
54.0
60.0
66.0
72.0
78.0
0.0%
www.bionicturtle.com
Fabozzi, Chapter 3:
Continuous
Probability
Distributions
In this chapter
Describe the key properties of Normal, Exponential, Weibull, Gamma, Beta, Chi
squared, Students t, Lognormal, Logistic and Extreme Value distributions.
Explain the summation stability of normal distributions.
Describe the hazard rate of an exponentially distributed random variable.
Explain the relationship between exponential and Poisson distributions.
Explain why the generalized Pareto distribution is commonly used to model
operational risk events.
Explain the concept of mixtures of distributions.
The middle of the distribution, mu (), is the mean (and median). This first moment is
also called the location
Standard deviation and variance are measures of dispersion (a.k.a., shape). Variance is
the second-moment; typically, variance is denoted by sigma-squared such that standard
deviation is sigma.
The distribution is symmetric around . In other words, the normal has skew = 0
www.bionicturtle.com
Summation stability: If you take the sum of several independent random variables,
which are all normally distributed with mean (i) and standard deviation (i), then the
sum will be normally distributed again.
The normal distribution possesses a domain of attraction. The central limit theorem
(CLT) states thatunder certain technical conditionsthe distribution of a large sum of
random variables behaves necessarily like a normal distribution.
The normal distribution is not the only class of probability distributions having a domain
of attraction. Actually three classes of distributions have this property: they are called
stable distributions.
Exponential
The exponential distribution is popular in queuing theory. It is used to model the time we have
to wait until a certain event takes place. According to the text, examples include the time
until the next client enters the store, the time until a certain company defaults or the time until
some machine has a defect.
f ( x ) e x , 1 , x 0
2.00
1.50
Exponential
0.5
1.00
0.50
0.00
www.bionicturtle.com
Weibull
Weibull is a generalized exponential distribution; i.e., the exponential is a special case of the
Weibull where the alpha parameter equals 1.0.
F(x) 1 e
,x 0
Weibull distribution
2.00
alpha=.5,
beta=1
1.50
1.00
alpha=2, beta=1
0.50
alpha=2, beta=2
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
The main difference between the exponential distribution and the Weibull is that, under the
Weibull, the default intensity depends upon the point in time t under consideration. This allows
us to model the aging effect or teething troubles:
For > 1also called the light-tailed casethe default intensity is monotonically increasing
with increasing time, which is useful for modeling the aging effect as it happens for machines:
The default intensity of a 20-year old machine is higher than the one of a 2-year old machine.
For < 1the heavy-tailed casethe default intensity decreases with increasing time. That
means we have the effect of teething troubles, a gurative explanation for the effect that after
some trouble at the beginning things work well, as it is known from new cars. The credit spread
on noninvestment-grade corporate bonds provides a good example: Credit spreads usually
decline with maturity. The credit spread reects the default intensity and, thus, we have the
effect of teething troubles. If the company survives the next two years, it will survive for a
longer time as well, which explains the decreasing credit spread.
For = 1, the Weibull distribution reduces to an exponential distribution with parameter .
www.bionicturtle.com
Gamma distribution
The family of Gamma distributions forms a two parameter probability distribution family with
the density function (pdf) given by:
1
f (x)
e x
( )
,x 0
Gamma distribution
1.20
1.00
0.80
alpha=1, beta=1
0.60
alpha=2, beta=.5
0.40
alpha=4, beta=.25
0.20
The Gamma distribution is related to:
For alpha = k/2 and beta = 2, Gamma distribution becomes Chi-square distribution
Beta distribution
The beta distribution has two parameters: alpha (center) and beta (shape). The beta
distribution is very flexible, and popular for modeling recovery rates.
Beta distribution
(popular for recovery rates)
0.06
0.05
0.04
0.03
0.02
0.01
0.00
0.07
0.14
0.21
0.28
0.35
0.42
0.49
0.56
0.63
0.70
0.77
0.84
0.91
0.98
alpha = 2, beta = 4
www.bionicturtle.com
2.0
4.0
beta (shape)
6.0
3.3
55%
0.03
0.02
0.01
Senior
Junior
0%
6%
12%
18%
24%
30%
36%
42%
48%
54%
60%
66%
72%
78%
84%
90%
96%
0.00
Lognormal
The lognormal is common in finance: If an asset return (r) is normally distributed, the
continuously compounded future asset price level (or ratio or prices; i.e., the wealth ratio) is
lognormal. Expressed in reverse, if a variable is lognormal, its natural log is normal.
Lognormal
Non-zero, positive skew, heavy right tail
1.00%
0.80%
0.60%
0.40%
0.20%
0.00%
www.bionicturtle.com
Logistic
A logistic distribution has heavy tails:
Logistic distribution
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
-
alpha=0, beta=1
alpha=2, beta=1
alpha=0, beta=3
N(0,1)
7 10 13 16 19 22 25 28 31 34 37 40
Peaks over threshold (POT). The modern approach that is often preferred.
www.bionicturtle.com
Block maxima
The dataset is parsed into (m) identical, consecutive and non-overlapping periods called blocks.
The length of the block should be greater than the periodicity; e.g., if the returns are daily, blocks
should be weekly or more. Block maxima partitions the set into time-based intervals. It requires
that observations be identically and independently (i.i.d.) distributed.
exp (1 y ) 0
H ( y )
y
0
exp( e )
The (xi) parameter is the tail index; it represents the fatness of the tails. In this expression, a
lower tail index corresponds to fatter tails.
0.15
0.10
0.05
45
40
35
30
25
20
15
10
0.00
Per the (unassigned) Jorion reading on EVT, the key thing to know here is that (1) among
the three classes of GEV distributions (Gumbel, Frechet, and Weibull), we only care
about the Frechet because it fits to fat-tailed distributions, and (2) the shape parameter
determines the fatness of the tails (higher shape fatter tails)
www.bionicturtle.com
The cumulative distribution function here refers to the probability that the excess loss (i.e., the
loss, X, in excess of the threshold, u, is less than some value, y, conditional on the loss exceeding
the threshold):
FU ( y ) P( X u y | X u )
u
-4 -3 -2 -1
x
)
1 (1
G , ( x )
1 exp( x )
www.bionicturtle.com
0
0
Block maxima is: time-based (i.e., blocks of time), traditional, less sophisticated, more
restrictive in its assumptions (i.i.d.)
Peaks over threshold (POT) is: more modern, has at least three variations (semiparametric; unconditional parametric; and conditional parametric), is more flexible
EVT Highlights:
Both GEV and GPD are parametric distributions used to model heavy-tails.
GEV (Block Maxima)
www.bionicturtle.com
f (x)
,x 0
x
F(x) 1 e , x 0
1
f ( x ) e x
F ( x ) 1 e x
www.bionicturtle.com
6
24
0.250
4
24
0.167
6.00
16.1%
16.1%
1
0.25
77.9%
0.042
77.9%
22.1%
12
2.00
13.5% Exponential distribution gives
0.042 the probability that the next
13.5% loss will occur before (within)
86.5% the next 1 hour (12 hours)
0.17
4.00
77.9%
22.1%
0.25
6.00
13.5%
86.5%
0.45
Normal 1
Normal 2
Mixture
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-10
www.bionicturtle.com
-5
10
Monte Carlo
Methods
In this chapter
Describe how to simulate a price path using a geometric Brownian motion model.
Describe how to simulate various distributions using the inverse transform method.
Describe the bootstrap method.
Explain how simulations can be used for computing VaR and pricing options.
Describe the relationship between the number of Monte Carlo replications and the
standard error of the estimated values.
Describe and identify simulation acceleration techniques.
Explain how to simulate correlated random variables using Cholesky factorization.
Describe deterministic simulations.
Discuss the drawbacks and limitations of simulation procedures.
Specify a
random
process (GBM
for a stock)
Run trials
(10 or 1 MM)
each a
function of
random
variable
Sort
outcomes,
best to worst.
Quintiles
(e.g., 1%ile)
are VaRs
Geometric Brownian Motion (GBM) is the continuous motion/ process in which the randomly
varying quantity (in our example Asset Value) has a fluctuated movement and is dependent on
a variable parameter (in our example the stochastic variable is Time). The standard variable
parameter is the shock and the progress in the assets value is the drift. Now, the GBM can be
represented as drift + shock as shown below.
www.bionicturtle.com
The above illustration is the shock and drift progression of the asset. The asset drifts upward
with the expected return of
over the time interval t . But the drift is also impacted by shocks
from the random variable . We measure the standard deviation by a random variable
(which is the random shock) here. As the variance is adjusted with time t , volatility is adjusted
with the square root of time
t .
day 10
day 9
day 8
day 7
day 6
day 5
day 4
day 3
day 2
$8.00
day 1
$9.00
Expected Drift is the deterministic component but shock is the random component in this stock
price process simulation. The Y-axis has an empirical distribution rather than a parametric
distribution and can be easily used to calculate the VaR. This Monte Carlo Distribution allows us
to produce an empirical distribution in future which can be used to calculate the VaR.
GBM assumes constant volatility (generally a weakness) unlike GARCH(1,1) which models
time-varying volatility.
www.bionicturtle.com
Random
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
CDF
NORMSINV()
-1.282
-1.036
-0.842
-0.674
-0.524
-0.385
-0.253
-0.126
0.000
Pdf
NORMDIST()
0.18
0.23
0.28
0.32
0.35
0.37
0.39
0.40
0.40
A random variable is generated, between 0 and 1. In Excel, the function is =RAND(). This
corresponds to the Y-axis on the first chart below. This will necessarily correspond to standard
normal CDF; e.g., RAND(.4) corresponds to -0.126 because NORMSINV(RAND(.4)) = -0.126.
www.bionicturtle.com
-5.000
0.000
5.000
current portfolio through the single historical samplethe bootstrap randomizes the historical
sample and therefore can generate many historically-informed samples.
The advantages of the bootstrap include: can model fat-tails (like HS); by generating
repeated samples, we can ascertain estimate precision. Limitations, according to Jorion,
include: for small sample sizes, the bootstrapped distribution may be a poor
approximation of the actual one.
Randomize
Historical Date,
But same indexed
returns within date
(preserves cross-sectional
correlations)
www.bionicturtle.com
Monte Carlo
Bootstrapping
th
Randomizes return
Built-in correlation
No distributional assumption
(does not assume normality)
Model risk
Able to account for a range of risks (e.g., price risk, volatility risk, and nonlinear
exposures)
Simple to implement
www.bionicturtle.com
Explain how simulations can be used for computing VaR and pricing
options.
Value at Risk (VaR)
Once a price path has been generated, we can build a portfolio distribution at the end of the
selected horizon:
1. Choose a stochastic process and parameters
2. Generate a pseudo-sequence of variables from which prices are computed
3. Calculate the value of the asset (or portfolio) under this particular sequence of prices at
the target horizon
4. Repeat steps 2 and 3 as many times as needed
This process creates a distribution of values. We can sort the observations and tabulate the
expected value and the quantile, which is the expected value in c times 10,000 replications.
Value at risk (VaR) relative to the mean is then:
ft E * e rt F (ST )
This formula means that each future simulated price, F(St), is discounted at the risk-free rate;
i.e., to solve for the present value. Then the average of those values is the expected value, or
value of the option. The Monte Carlo method has several advantages. It can be applied in
many situations, including options with so-called price-dependent paths (i.e., where the value
depends on the particular path) and options with atypical payoff patterns. Also, it is powerful
and flexible enough to handle varieties of options. With one notable exception: it cannot
accurately price options where the holder can exercise early (e.g., American-style options).
www.bionicturtle.com
SE ( )
1
SE ( )
2T
1
2T
Therefore to increase VaR precision by (1/T) requires a multiple of about T2 the number of
replications; e.g., x 10 precision needs x 100.
= 10^2 =
100x
replications
se() = 1/10
reduce se() for
better precision
Antithetic variable technique: changes the sign of the random samples. Appropriate
when the original distribution is symmetric. Creates twice as many replications at little
additional cost.
Importance sampling technique (Jorion calls this the most effective acceleration
technique): attempts to sample along more important paths
Stratified sampling technique: partitions the simulation region into two zones.
www.bionicturtle.com
www.bionicturtle.com
1 1
2 1 (1 2 )2
1,2 : independent random variables
:
correlation coefficient
1, 2 : correlated random variables
Correlated Random Vars
2.00
$16.0
1.00
$14.0
$12.0
$10.0
-1.00
$8.0
-2.00
$6.0
-3.00
-4.00
$4.0
-2.00
1 4 7 10 13 16 19 22 25 28 31
2.00
Correlation 0.75
N (0,1)
2.06
0.52
1.51
(1.44)
Correlated
N (0,1)
1.26
(0.73)
0.99
0.48
Mean
Volatility
Series
#1
1%
10%
Series
#2
1%
10%
Series
#1
$10.0
$10.62
$12.34
$10.68
Series
#2
$10.0
$9.37
$10.39
$11.00
If the variables are uncorrelated, randomization can be performed independently for each
variable. Generally, however, variables are correlated. To account for this correlation, we start
with a set of independent variables , which are then transformed into the (). In a two-variable
setting, we construct the following:
www.bionicturtle.com
1 1
2 1 (1 2 )1/22
= is the correlation coefficient between the variables ().
This is a transformation of two independent random variable into correlated random variables.
Prior to the transformation, 1 and 2 are random variables that have necessary correlation.
The first random variable is retained (1 = 1) and the second is transformed (recast) into a
random variable that is correlated
k.
Monte Carlo simulations methods generate independent, pseudorandom points that attempt to
fill an N-dimensional space, where N is the number of variables driving the price of securities.
Researchers now realize that the sequence of points does not have to be chosen randomly. In a
deterministic scheme, the draws (or trials) are not entirely random. Instead of random trials,
this scheme fills space left by previous numbers in the series.
Scenario Simulation
The first step consists of using principal-component analysis to reduce the dimensionality of the
problem; i.e., to use the handful of factors, among many, that are most important.
The second step consists of building scenarios for each of these factors, approximating a normal
distribution by a binomial distribution with a small number of states.
www.bionicturtle.com
However
www.bionicturtle.com
Estimating
Volatilities and
Correlations
In this chapter
Discuss how historical data and various weighting schemes can be used in
estimating volatility.
Describe the exponentially weighted moving average (EWMA) model for estimating
volatility and its properties.
Estimate volatility using the EWMA model.
Describe the generalized auto regressive conditional heteroscedasticity
[GARCH(p,q)] model for estimating volatility and its properties.
Estimate volatility using the GARCH(p,q) model.
Explain mean reversion and how it is captured in the GARCH(1,1) model.
Discuss how the parameters of the GARCH(1,1) and the EWMA models are estimated
using maximum likelihood methods.
Explain how GARCH models perform in volatility forecasting.
Discuss how correlations and covariances are calculated, and explain the
consistency condition for covariances.
Discuss how historical data and various weighting schemes can be used in
estimating volatility.
Take two steps to compute historical (not implied) volatility:
1. Compute the series of periodic (e.g., daily) returns,
2. Choose a weighting scheme (to translate a series into a single metric)
www.bionicturtle.com
S
ui ln i
Si 1
The simple percentage change is given by:
ui
Si Si 1
Si 1
John Hull uses simple percentage but Linda Allen uses log return (continously
compounded) because log returns are time consistent. Hulls method is not incorrect,
rather, it is an acceptable approximation for short (daily) periods.
1 m
(un i u )2
m 1 i 1
2
n
n2
m
u
Hull, for practical purposes, makes the following two simplifying assumptions:
u 0
1 m 2
variance = un i
m i 1
2
n
www.bionicturtle.com
This simplified version replaces (m-1) with (m) in the denominator. (m-1) produces an
unbiased estimator and (m) produces a maximum likelihood estimator.
How can there be two different formulas for sample variance? Recall (from Gujarati) these
are estimators: recipes intended to produce estimates of the true population variance.
There can be different recipes; although many will have undesirable properties.
i un2 i
2
n
i 1
The alpha () parameters are simply weights; the sum of the alpha () parameters must equal
one because they are weights.
We can now add another factor to the model: the long-run average variance rate. The idea is
that the variance is mean regressing: think of it the variance as having a gravitational pull
toward its long-run average. We add another term to the equation above, in order to capture the
long-run average variance. The added term is the weighted long-run variance:
m
VL i un2i
2
n
i 1
The added term is gamma (the weighting) multiplied by () the long-run variance because
the variance is a weighted factor.
This is known as an ARCH (m) model. Often omega () replaces the first term. So here is a reformatted ARCH (m) model:
m
i un2i
2
n
www.bionicturtle.com
i 1
1 m 2
un i
m i 1
2
n
Weighted Scheme
m
n2 i un2 i
i 1
n2 (1 ) 0un21
(1 ) 1un2 2
(1 ) 2un23
Recursive version of EWMA:
n2 n21 (1 )un21
www.bionicturtle.com
EWMA is a special case of GARCH (1,1). Here is how we get from GARCH (1,1) to EWMA:
t2 t21 (1 )rt21,t
In EWMA, the lambda parameter now determines the decay: a lambda that is close to one
(high lambda) exhibits slow decay.
RiskMetricsTM Approach
RiskMetrics is a branded form of the exponentially weighted moving average (EWMA) approach:
ht ht 1 (1 )rt21
The optimal (theoretical) lambda varies by asset class, but the overall optimal parameter used
by RiskMetrics has been 0.94. In practice, RiskMetrics only uses one decay factor for all series:
Technically, the daily and monthly models are inconsistent. However, they are both easy to use,
they approximate the behavior of actual data quite well, and they are robust to misspecification.
Each of GARCH (1, 1), EWMA and RiskMetrics are each parametric and recursive.
www.bionicturtle.com
n2 n21 (1 )un21
GARCH (1,1) is the weighted sum of a long run-variance (weight = gamma), the most recent
squared-return (weight = alpha), and the most recent variance (weight = beta)
n2 VL un21 n21
The mean reversion term is the product of a weight (gamma) and the long-run
(unconditional) variance. If gamma = 0, GARCH(1,1) reduces to EWMA
n2 VL un21 n21
www.bionicturtle.com
In the volatility practice bag (learning spreadsheet 2.b.6), we illustrate and compare the
calculation of EWMA to GARCH(1,1):
www.bionicturtle.com
0.000236
1.53%
10
0.00023218
1.524%
0.60%
0.000036
10.00
10.21
2.0%
0.000074 *lag variance+(1-)*lag return^2
0.86%
10
0.00005463 L.R. var + (alpha + beta)^t*(var-L.R. var)
0.739%
n2 un21 n21
We can solve for the long-run average variance as a function of omega and the weights (alpha,
beta):
VL
Discuss how the parameters of the GARCH(1,1) and the EWMA models are
estimated using maximum likelihood methods.
In maximum likelihood methods we choose parameters that maximize the likelihood of the
observations occurring.
0.001
mu
0.0006
0.006
Omega
Alpha
Beta
0.0000
0.0001
0.8221
mu * 1000
0.646
alpha
0.000
persistence
0.822
variance*10000
Log likelihood value:
0.363
110.94
E[ n2 k ] VL ( )k ( n2 VL )
The expected future variance rate, in (t) periods forward, is given by:
E[ n2t ] VL ( )t ( n2 VL )
www.bionicturtle.com
For example, assume that a current volatility estimate (period n) is given by the following
GARCH (1, 1) equation:
VL
0.00008
0.0004
1 1 0.7 0.1
Second, we need the current variance (period n). That is almost given to us above:
Discuss how correlations and covariances are calculated, and explain the
consistency condition for covariances.
Correlations play a key role in the calculation of value at risk (VaR). We can use similar methods
to EWMA for volatility. In this case, an updated covariance estimate is a weighted sum of
www.bionicturtle.com
Saunders, Chapter 2:
Quantifying Volatility in
VaR Models
In this chapter
Discuss how asset return distributions tend to deviate from the normal distribution.
Explain potential reasons for the existence of fat tails in a return distribution and
discuss the implications fat tails have on analysis of return distributions.
Distinguish between conditional and unconditional distributions.
Discuss the implications regime switching has on quantifying volatility.
Explain the various approaches for estimating VaR.
Compare, contrast and calculate parametric and non-parametric approaches for
estimating conditional volatility, including: Historic simulation
Compare, contrast and calculate parametric and non-parametric approaches for
estimating conditional volatility, including: Historical standard deviation
Compare, contrast and calculate parametric and non-parametric approaches for
estimating conditional volatility, including: Exponential smoothing
Compare, contrast and calculate parametric and non-parametric approaches for
estimating conditional volatility, including: GARCH approach
Compare, contrast and calculate parametric and non-parametric approaches for
estimating conditional volatility, including: Multivariate density estimation
Compare, contrast and calculate parametric and non-parametric approaches for
estimating conditional volatility, including: Hybrid methods
Explain the process of return aggregation in the context of volatility forecasting
methods.
Explain how implied volatility can be used to predict future volatility and discuss its
advantages and disadvantages.
Explain the implications of mean reversion in returns and return volatility for
forecasting VaR over long time horizons.
Discuss the effects non-synchronous data has on estimating correlation and describe
approaches that mitigate the impact of non-synchronous data on risk estimates.
Discuss the use of backtesting for comparing VaR results using different volatility
estimation approaches and the desirable attributes of VaR estimates.
www.bionicturtle.com
Key terms
Risk varies over time. Models often assume a normal (Gaussian) distribution (normality) with
constant volatility from period to period. But actual returns are non-normal and volatility varies
over time (volatility is time-varying or non-constant). Therefore, it is hard to use parametric
approaches to random returns; in technical terms, it is hard to find robust distributional
assumptions for stochastic asset returns
Persistence: In EWMA, the lambda parameter (). In GARCH (1,1), the sum of the
alpha () and beta () parameters. High persistence implies slow decay toward to
the long-run average variance.
Leptokurtosis: a fat-tailed distribution where relatively more observations are near the
middle and in the fat tails (kurtosis > 3)
S
ui ln i
Si 1
ui
Si Si 1
Si 1
Linda Allen contrasts three periodic returns (i.e., continuously compounded, simple
percentage change, and absolute level change). She argues continuously compounded
must be used when computing VAR because it is time consistent (except for interestrate related variables which use the absolute level change).
www.bionicturtle.com
For practical purposes, the above equation is often simplified with the following assumptions:
u 0
1 m 2
variance = un i
m i 1
2
n
This simplified version replaces (m-1) with (m) in the denominator. (m-1) produces an
unbiased estimator and (m) produces a maximum likelihood estimator.
i un2 i
2
n
i 1
The alpha () parameters are simply weights; the sum of the alpha () parameters must equal
one because they are weights.We can now add another factor to the model: the long-run average
variance rate. The idea here is that the variance is mean regressing: think of it the variance as
having a gravitational pull toward its long-run average. We add another term to the equation
above, in order to capture the long-run average variance. The added term is the weighted longrun variance:
www.bionicturtle.com
VL i un2i
2
n
i 1
The added term is gamma (the weighting) multiplied by () the long-run variance because
the variance is a weighted factor.
formatted ARCH (m) model:
m
n2 i un2i
i 1
www.bionicturtle.com
www.bionicturtle.com
252
$100
10
95%
10.0%
12.0%
50%
20.0%
25.0%
50%
0.30
0.25 If independent, = 0. Mean reverting returns = negative
0.0060
0.0155
18.5%
12.4%
0.73%
2.48%
15.78 Dont need to know this, used for AR(1)
3.12% Standard deviation if auto-correlation.
1.64 Normal deviate
100.73
$4.08 Doesnt include the mean return
$3.35 Includes return; i.e., loss from zero
$5.12 The corresponding VaRs, if autocorrelation incorporated.
$4.39 Note VaR is higher!
Discuss how asset return distributions tend to deviate from the normal
distribution.
Compared to a normal (bell-shaped) distribution, actual asset returns tend to be:
Unstable: the parameters (e.g., mean, volatility) vary over time due to variability in
market conditions.
NORMAL RETURNS
Symmetrical
Skewed
Normal Tails
Fat-tailed (leptokurtosis)
Stable
Unstable (time-varying)
4.5%
4.0%
3.5%
3.0%
2.5%
2.0%
1.5%
1.0%
0.5%
0.0%
3rd Moment =
Skew 3
2nd Variance
scale
4th Moment =
kurtosis 4
1st moment
-3
www.bionicturtle.com
Actual returns:
1. Skewed
2. Fat-tailed
(kurtosis>3)
3. Unstable
-2
-1
0
1
Mean
location
Conditional mean is time-varying; but this is unlikely given the assumption that
markets are efficient
Conditional volatility is time-varying; Allen says this is the more likely explanation!
The true distribution is stationary. Therefore, fat-tails reflect the true distribution but the
normal distribution is not appropriate
The true distribution changes over time (it is time-varying). In this case, outliers can in
reality reflect a time-varying volatility.
www.bionicturtle.com
www.bionicturtle.com
VaR$ W$z
VaR% z
Linda Allens Historical-based approaches
The common attribute to all the approaches within this class is their use of historical time series
data in order to determine the shape of the conditional distribution.
www.bionicturtle.com
obtained from the BlackScholes option pricing model as a predictor of future volatility is the
most prominent representative of this class of models.
Risk Measurement
Local valuation
Full valuation
Linear models
Nonlinear models
Historical
Simulation
Full Covariance
matrix
Gamma
Monte Carlo
Simulation
Factor Models
Convexity
Diagonal Models
www.bionicturtle.com
Delta normal
Non parametric
Historical
Simulation
Bootstrap
Monte Carlo
Hybrid (semi-p)
HS + EWMA
EVT
POT (GPD)
Block
maxima
(GEV)
www.bionicturtle.com
Volatility
Implied Volatility
Equally weighted returns
or un-weighted (STDEV)
More weight to recent
returns
GARCH(1,1)
EWMA
www.bionicturtle.com
Historical approaches
An historical-based approach can be non-parametric, parametric or hybrid (both). Nonparametric directly uses a historical dataset (historical simulation, HS, is the most common).
Parametric imposes a specific distributional assumption (this includes historical standard
deviation and exponential smoothing)
t2 (1/ M ) rt2i
i 1
Each day, the forecast is updated by adding the most recent day and dropping the furthest day.
In a simple moving average, all weights on past returns are equal and set to (1/M). Note raw
returns are used instead of returns around the mean (i.e., the expected mean is assumed zero).
This is common in short time intervals, where it makes little difference on the volatility estimate.
For example, assume the previous four daily returns for a stock are 6% (n-1), 5% (m-2), 4% (n3) and 3% (n-4). What is a current volatility estimate, applying the moving average, given that
our short trailing window is only four days (m=14)? If we square each return, the series is
0.0036, 0.0025, 0.0016 and 0.0009. If we sum this series of squared returns, we get 0.0086.
Divide by 4 (since m=4) and we get 0.00215. Thats the moving average variance, such that the
moving average volatility is about 4.64%.
www.bionicturtle.com
The above example illustrates a key weakness of the moving average (MA): since all
returns weigh equally, the trend does not matter. In the example above, notice that
volatilty is trending down, but MA does not reflect in any way this trend. We could
reverse the order of the historical series and the MA estimation would produce the same
result.
The moving average (MA) series is simple but has two drawbacks
The MA series ignores the order of the observations. Older observations may no
longer be relevant, but they receive the same weight.
The MA series has a so-called ghosting feature: data points are dropped arbitrarily due
to length of the window.
Heteroskedastic (H): variances are not constant, they flux over time
GARCH regresses on lagged or historical terms. The lagged terms are either variance or
squared returns. The generic GARCH (p, q) model regresses on (p) squared returns and (q)
variances. Therefore, GARCH (1, 1) lags or regresses on last periods squared return (i.e., just 1
return) and last periods variance (i.e., just 1 variance).
GARCH (1, 1) given by the following equation.
t2 a brt21,t c t21
www.bionicturtle.com
The same GARCH (1, 1) formula can be given with Greek parameters: Hull writes the same
GARCH equation as: n
VL is the long run average variance. Therefore, (VL) is a product: it is the weighted long-run
average variance. The GARCH (1, 1) model solves for the conditional variance as a function of
three variables (previous variance, previous return^2, and long-run variance):
ht 0 1rt21 ht 1
ht or t2
a or
ht 1 or
2
rt-1
or
2
rt-1,t
previous variance
previous squared return
www.bionicturtle.com
Note that omega is 0.2 but dont mistake omega (0.2) for the long-run variance! Omega is the
product of gamma and the long-run variance. So, if alpha + beta = 0.9, then gamma must be
0.1. Given that omega is 0.2, we know that the long-run variance must be 2.0 (0.2 0.1 = 2.0).
EWMA
EWMA is a special case of GARCH (1,1) and GARCH(1,1) is a generalized case of EWMA. The
salient difference is that GARCH includes the additional term for mean reversion and EWMA
lacks a mean reversion. Here is how we get from GARCH (1,1) to EWMA:
t2 t21 (1 )rt21,t
In EWMA, the lambda parameter now determines the decay: a lambda that is close to one
(high lambda) exhibits slow decay.
www.bionicturtle.com
ht ht 1 (1 )rt21
The optimal (theoretical) lambda varies by asset class, but the overall optimal parameter used
by RiskMetrics has been 0.94. In practice, RiskMetrics only uses one decay factor for all series:
Technically, the daily and monthly models are inconsistent. However, they are both easy to use,
they approximate the behavior of actual data quite well, and they are robust to misspecification.
Note: GARCH (1, 1), EWMA and RiskMetrics are each parametric and recursive.
n2 n21 (1 )un21
Recursive EWMA
EWMA is (technically) an infinite series but the infinite series elegantly reduces to a recursive
form:
n2 (1 ) 0un21
(1 ) 1un2 2
(1 ) 2un23
n2 n21 (1 )un21
n2 (0.94) n21 (0.06)un21
www.bionicturtle.com
GARCH
Ghosting feature
Except Linda Allen warns: GARCH (1,1) needs more parameters and may pose greater
MODEL RISK (chases a moving target) when forecasting out-of-sample
Graphical summary of the parametric methods that assign more weight to recent
returns (GARCH & EWMA)
www.bionicturtle.com
Summary Tips:
GARCH (1, 1) is generalized RiskMetrics; and, conversely, RiskMetrics is restricted case of
GARCH (1,1) where a = 0 and (b + c) =1. GARCH (1, 1) is given by:
n2 VL un21 n21
The three parameters are weights and therefore must sum to one:
1
Be careful about the first term in the GARCH (1, 1) equation: omega () = gamma() *
(average long-run variance). If you are asked for the variance, you may need to divide
out the weight in order to compute the average variance.
Determine when and whether a GARCH or EWMA model should be used in volatility
estimation
In practice, variance rates tend to be mean reverting; therefore, the GARCH (1, 1) model is
theoretically superior (more appealing than) to the EWMA model. Remember, thats the big
difference: GARCH adds the parameter that weights the long-run average and therefore it
incorporates mean reversion.
GARCH (1, 1) is preferred unless the first parameter is negative (which is implied if alpha
+ beta > 1). In this case, GARCH (1,1) is unstable and EWMA is preferred.
Explain how the GARCH estimations can provide forecasts that are more accurate.
The moving average computes variance based on a trailing window of observations; e.g., the
previous ten days, the previous 100 days.
There are two problems with moving average (MA):
Ghosting feature: volatility shocks (sudden increases) are abruptly incorporated into the
MA metric and then, when the trailing window passes, they are abruptly dropped from
the calculation. Due to this the MA metric will shift in relation to the chosen window
length
More recent observations are assigned greater weights. This overcomes ghosting
because a volatility shock will immediately impact the estimate but its influence will fade
gradually as time passes
www.bionicturtle.com
ht 0 1rt21 ht 1
Persistence 1
GARCH (1, 1) is unstable if the persistence > 1. A persistence of 1.0 indicates no mean reversion.
A low persistence (e.g., 0.6) indicates rapid decay and high reversion to the mean.
GARCH (1, 1) has three weights assigned to three factors. Persistence is the sum of the
weights assigned to both the lagged variance and lagged squared return. The other
weight is assigned to the long-run variance.
If P = persistence and G = weight assigned to long-run variance, then P+G = 1.
Therefore, if P (persistence) is high, then G (mean reversion) is low: the persistent series
is not strongly mean reverting; it exhibits slow decay toward the mean.
If P is low, then G must be high: the impersistent series does strongly mean revert; it
exhibits rapid decay toward the mean.
The average, unconditional variance in the GARCH (1, 1) model is given by:
LV
0
1 1
www.bionicturtle.com
Historic Simulation
(HS)
Sort returns
Lookup worst
If n=100, for 95th
percentile look
between bottom 5th
& 6th
Like ARCH(m)
But weights based on
function of [current
vs. historical state]
If state (n-50) state
(today), heavy
weight to that
return2
Advantages
Historical
Simulation
Multivariate
density
estimation
Hybrid
approach
www.bionicturtle.com
Hybrid
(HS & EWMA)
MDE
Disadvantages
Easiest to implement
(simple, convenient)
Very flexible: weights are
function of state (e.g.,
economic context such as
interest rates) not constant
Unlike the HS approach,
better incorporates more recent
information
( t i )ut2i
2
t
i 1
Kernel function
Where EWMA assigns the weight as an exponentially declining function of time (i.e., the
nearer to today, the greater the weight), MDE assigns the weight based on the nature of
the historical period (i.e., the more similar to the historical state, the greater the weight)
www.bionicturtle.com
Sorted
Return
Periods
Ago
Hybrid
Weight
-31.8%
-28.8%
-25.5%
-22.3%
5.7%
6.1%
6.5%
6.9%
12.1%
60.6%
7
9
6
10
1
2
3
4
5
8
8.16%
6.61%
9.07%
5.95%
15.35%
13.82%
12.44%
11.19%
10.07%
7.34%
Cum'l
Hybrid Compare
Weight
to HS
8.16%
14.77%
23.83%
29.78%
45.14%
58.95%
71.39%
82.58%
92.66%
100.00%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
However, under the hybrid approach, the EWMA weighting scheme is instead applied. Since the
worst return happened seven (7) periods ago, the weight applied is given by the following,
assuming a lambda of 0.9 (90%):
Weight (7 periods prior) = 90%^(7-1)*(1-90%)/(1-90%^10) = 8.16%
Note that because the return happened further in the past, the weight is below the 10% that is
assigned under simple HS.
120%
100%
Hybrid
Weights
HS Weights
80%
60%
40%
20%
0%
1
www.bionicturtle.com
10
Cumulative Weight
HS
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
0.2%
0.1%
0.1%
0.1%
0.2%
0.6%
0.2%
3.7%
0.1%
0.4%
Hybrid
0.2%
0.3%
0.4%
0.5%
0.7%
1.3%
1.4%
5.1%
5.2%
5.7%
In this case:
We are solving for the 95th percentile (95%) value at risk (VaR)
The HS 95% VaR = ~ 4.25% because it is the fifth-worst return (actually, the quantile can
be determined in more than one way)
However, the hybrid approach returns a 95% VaR of 3.08% because the worst returns
that inform the dataset tend to be further in the past (i.e., days ago = 76, 94, 86, 90).
Due to this, the individual weights are generally less than 1%.
www.bionicturtle.com
The third approach is to combine these two approaches: aggregate the simulated returns and
then apply a parametric (normal) distributional assumption to the aggregated portfolio.
The first approach (variance-covariance) requires the dubious assumption of normalityfor the
positions inside the portfolio. The text says the third approach is gaining in popularity and is
justified by the law of large numbers: even if the components (positions) in the portfolio are not
normally distributed, the aggregated portfolio will converge toward normality.
www.bionicturtle.com
Model-dependent
cmarket f ( ISD )
Where the implied standard deviation (ISD) is the volatility input into an option pricing model
(OPM). Similarly, implied correlations can also be recovered (reverse-engineered) from
options on multiple assets. According to Jorion, ISD is a superior approach to volatility
estimation. He says, Whenever possible, VAR should use implied parameters [i.e., ISD or
market implied volatility].
Mean reversion in the asset dynamics. The price/return tends towards a long-run
level; e.g., interest rate reverts to 5%, equity log return reverts to +8%
Mean reversion in variance. Variance reverts toward a long-run level; e.g., volatility
reverts to a long-run average of 20%. We can also refer to this as negative
autocorrelation, but it's a little trickier. Negative autocorrelation refers to the fact that a
high variance is likely to be followed in time by a low variance. The reason it's tricky is
due to short/long timeframes: the current volatility may be high relative to the long run
mean, but it may be "sticky" or cluster in the short-term (positive autocorrelation) yet, in
the longer term it may revert to the long run mean. So, there can be a mix of (short-term)
positive and negative autocorrelation on the way being pulled toward the long run mean.
www.bionicturtle.com
(rt ,t J ) (rt ,t 1) J
For example, if the 1-period VAR is $10, then the 2-period VAR is $14.14 ($10 x square root of 2)
and the 5-period VAR is $22.36 ($10 x square root of 5).
The square-root-rule: under the two assumptions below, VaR scales with the square root
of time. Extend one-period VaR to J-period VAR by multiplying by the square root of J.
The square root rule (i.e., variance is linear with time) only applies under restrictive i.i.d.
The square-root rule for extending the time horizon requires two key assumptions:
Random-walk (acceptable)
www.bionicturtle.com