03cvar PDF
03cvar PDF
03cvar PDF
Conditional Heteroscedasticity
Junhui Qian
1 Introduction
ARMA(p,q) models dictate that the conditional mean of a time series depends on past
observations of the time series and the past innovations. Let µt = E(Xt |Ft−1 ), we have for
an ARMA(p,q) process,
If we assume εt ∼ i.i.d. with zero mean and finite variance, then the conditional variance
In this chapter we relax this constraint and consider time-varying conditional variance.
Xt = µt + ωt ,
where µt is the conditional mean as above and ωt is a white noise with time-varying condi-
ωt = σt εt , (1)
2
where εt is a strong white noise with zero mean and unit variance, ie, εt ∼ iid(0, 1). And
σt2 is the conditional variance of Xt , ie, σt2 = var(Xt |Xt−1 , Xt−2 , ...).
For ARCH and GARCH models, σt2 evolves over time in a deterministic manner. For
σt2 = c + aωt−1
2
, (2)
where c > 0 and a ≥ 0. The positiveness of a implies that the probability of getting a large
shock in ωt is high when there is a big shock in ωt−1 . ARCH(1) model thus describes the
σt2 = c + a1 ωt−1
2
+ · · · + ap ωt−p
2
, (3)
(a) Let ηt = ωt2 − σt2 . (ηt ) is a martingale difference sequence, ie, E(ηt |Xt−1 , Xt−2 , ...) = 0.
∑
p
σt2 = c + (ai ε2t−i )ωt−i
2
. (4)
i=1
For the ARCH(1) model in (2) in particular, var(ωt ) = c/(1 − a). Since variance has to be
positive, we must have 0 < a < 1. And if we assume εt ∼ iid N (0, 1), we can calculate the
fourth moment of ωt ,
3c2 (1 + a)
E(ωt4 ) = .
(1 − a)(1 − 3a2 )
E(ωt4 ) 1 − a2
Kurtosis(ωt ) = = 3 > 3. (5)
[var(ωt )]2 1 − 3a2
Since the kurtosis of ωt is greater than 3, the kurtosis of normal distribution, the tail of
the distribution of ωt is heavier or longer than the normal, which is saying large shocks are
more probable for ωt than a normal series. Of course, to ensure the kurtosis in (5) to be
√
positive, we must have 1 − 3a2 > 0, hence a is restricted to [0, 3/3).
One weakness of ARCH(p) models is that it may need many lags, ie, a big p, to fully
absorb the correlation in ωt2 . In the same spirit of the extension from AR to ARMA models,
Bollerslev (1986) proposes GARCH model, which specifies the conditional variance σt2 as
follows.
σt2 = c + a1 ωt−1
2
+ · · · + ap ωt−p
2 2
+ b1 σt−1 + · · · + bq σt−q
2
, (6)
∑max(p,q)
where c > 0, ai ≥ 0, bi ≥ 0 for all i, and i=1 (ai + bi ) < 1.
for all i, GARCH(p, q) reduces to ARCH(p). The GARCH component ωt has following
properties.
(a) Let ηt = ωt2 − σt2 . (ηt ) is a martingale difference sequence, ie, E(ηt |Xt−1 , Xt−2 , ...) = 0.
4
∑
r ∑
q
ωt2 =c+ (ai + bi )ωt−i + ηt −
2
bi ηt−i , (7)
i=1 i=1
where (ai ) or (bi ) were padded with zero to have a length of r if necessary.
∑
r
σt2 = c + (ai ε2t−i + bi )ωt−i
2
. (8)
i=1
∑p ∑q
(d) ωt is a white noise, with an (unconditional) variance of var(ωt ) = c/(1− i ai − i bi ).
GARCH(1,1) is perhaps the most popular model in practice. The conditional variance
is specified as follows,
σt2 = c + aωt−1
2 2
+ bσt−1 , (9)
E(ωt4 ) 1 − (a + b)2
Kurtosis(ωt ) = = 3 > 3.
[var(ωt )]2 1 − (a + b)2 − 2a2
Since ARCH model is a special case of GARCH, we will focus on GARCH hereafter.
3.1 Identification
For all GARCH models, the square of the GARCH component, ωt2 , is serially correlated.
This gives us a test on whether a given process is GARCH – we may simply use the Ljung-
We may also use Engle’s (1982) Lagrange test. This test is equivalent to the F -test on
ωt2 = c + a1 ωt−1
2
+ · · · + a2m ωt−m + ηt ,
To determine the order of ARCH(p), we may examine the PACF of ωt2 . If we believe
the model is GARCH(p = 0, q), then we may use the ACF of ωt2 to determine q.
Finally, we may use information criteria such as AIC to determine the order of GARCH(p,
q).
3.2 Estimation
l(θ|ω1 , ..., ωT ) = log [f (ωT |FT −1 )f (ωT −1 |FT −2 ) · · · f (ωp+1 |Fp )f (ω1 , ..., ωp ; θ)]
∏T ( 2
)
1 ω
= log f (ω1 , ..., ωp ; θ) log √ − t2
2
2πσt 2σ
t=p+1 t
∑T ( 2
)
1 ω
= log f (ω1 , ..., ωp ; θ) − log(2π) + log(σt2 ) + t2 ,
2 σt
t=p+1
where θ is the set of parameters to be estimated, f (ωs |Fs−1 ) is the density of ωt conditional
on the information contained in (ωt ) up to time s − 1, and f (ω1 , ..., ωp ; θ) is the joint
distribution of ω1 , ..., ωp .
Since the form of f (ω1 , ..., ωp ; θ) is rather complicated, the usual practice is to ignore
( )
1 ∑
T
ω2
l(θ|ω1 , ..., ωT ) = − log(2π) + log(σt2 ) + t2 . (10)
2 σt
t=p+1
6
Note that the σt2 in the above log likelihood function is not observable and has to be
estimated recursively,
σt2 = c + a1 ωt−1
2
+ · · · + ap ωt−p
2 2
+ b1 σt−1 + · · · + bq σt−q
2
.
The initial values of σt2 are usually assigned to be the unconditional variance of ωt , which
∑ ∑
is c/(1 − i ai − i bi ).
ωt
ε̂t = .
σ̂t
If the model is adequate and it is appropriately estimated, (ε̂t ) should be iid normal. We
may apply Ljung-Box test to (ε̂t ) to see if the conditional mean, µt in (??), is correctly
specified. We may apply Ljung-Box test to (ε̂2t ) to see if the model of ωt is adequate.
Finally, we may use Jarque-Bera Test and QQ-plot to check whether εt is normal.
We may, of course, use other distribution for the specification of εt . For example, one
popular choice is the Student-t, which has heavier tails than the normal distribution. For
the purpose of consistently estimating GARCH parameters such as (ai ) and (bi ), the choice
of distribution does not matter much. It can be shown that maximizing the log likelihood
in (10) yields consistent estimator even when the distribution of εt is not normal. This is
3.3 Forecasting
Forecasting volatility is perhaps the most interesting aspect of GARCH model in practice.
where (ωT2 , ..., ωT2 −p+1 ) and (σT2 , ..., σT2 −q+1 ) are known at time T . Note that the one-step-
σ̂T2 +2 = c + a1 E(ωT2 +1 |FT ) + a2 ωT2 + · · · + ap ωT2 −p+2 + b1 σT2 +1 + b2 σT2 + · · · + bq σT2 −q+2
n-step-ahead forecast can be constructed similarly. For GARCH(1, 1) model in (9), the
c(1 − (a + b)n−1 ) c
σ̂T2 +n = + (a + b)n−1 σT2 +1 → ,
1−a−b 1−a−b
c
as n goes to infinity. 1−a−b is exactly the unconditional variance of ωt .
4 Extensions
There are many extensions to the GARCH model. In this section we discuss four of them,
4.1 IGARCH
When the ARMA representation in (7) of a GARCH model has a unit root in its AR
polynomial, the GARCH model is integrated in ωt2 . The model is then called Integrated
GARCH, or IGARCH.
The key feature of IGARCH lies in the implication that any shock in volatility is per-
sistent. This is similar with ARIMA model, in which any shock in mean is persistent. Take
8
ωt = σt εt , σt2 = c + bσt−1
2
+ (1 − b)ωt−1
2
.
ωt2 = c + ωt−1
2
+ ηt − bηt−1 .
Then we have
σ̂T2 +2 = c + σT2 +1
···
The case when c = 0 is especially interesting. In this case, the volatility forecasts
σ̂T2 +n = σT2 +1 for all n. This approach is indeed adopted by RiskMetrics for the calculation
4.2 GARCH-M
To model premium for holding risky assets, we may let the conditional mean depend on
the conditional variance. This is the idea of GARCH in mean, or GARCH-M. A typical
9
Xt = µt + ωt , µt = α′ zt + βσt2 , ωt = σt εt , (11)
where zt is a vector of explanatory variables and the specification for σt2 is the same as in
GARCH models.
4.3 APGARCH
To model leverage effects, which make volatility more sensitive to negative shocks, we may
consider the Asymmetric Power GARCH of Ding, Granger, and Engle (1993). A typical
∑
q ∑
p
σtδ =c+ ai (|εt−i | + γi εt−i ) +
δ δ
bi σt−i , (12)
i=1 i=1
The impact of εt−i on σtδ is obviously asymmetric. Consider the term g(εt−i , γi ) =
We expect γi < 0.
γi ≤ 0,
10
4.4 EGARCH
To model leverage effects, we may also consider Exponential GARCH, or EGARCH, pro-
∑
p ∑
q
ht = log σt2 , ht = c + ai (|εt−i | + γi εt−i ) + bi ht−i . (13)
i=1 i=1
As in APGARCH, we expect γi < 0. When εt−i > 0 (there is good news), the impact of
In all ARCH and GARCH models, the evolution of the conditional variance σt2 is determin-
SV (Stochastic Volatility) models relax this constraint and posit that the volatility itself
The SV model can be estimated using quasi-likelihood methods via Kalman filtering or
MCMC (Monte Carlo Markov Chain). Some applications show that SV models provide bet-
ter performance in terms of model fitting. But their performance in out-of-sample volatility
Appendix
The Ljung-Box test is a test of whether any of a group of autocorrelations of a time series
are different from zero. It is a joint test based on a number of lags and is therefore a
portmanteau test.
∑
h
ρ̂2k
Q = n (n + 2) ,
n−k
k=1
where n is the sample size, ρ̂k is the sample autocorrelation at lag k, and h is the number
h degrees of freedom. The LjungCBox test is commonly used in model diagnostics after
The Jarque-Bera test can be used to test the null hypothesis that the data are from a
normal distribution, based on the sample kurtosis and skewness. The test statistic JB is
defined as
n( 2 )
JB = S + (K − 3)2 /4 ,
6
where n is the number of observations, S the sample skewness, and K the sample kurtosis.
JB is distributed as χ22 . The null hypothesis is a joint hypothesis of both the skewness
and excess kurtosis being 0, since samples from a normal distribution have an expected