Extra Stuff
Extra Stuff
Extra Stuff
Asset Returns
1.1 Returns
Let Pt denote the price of an asset at time t. First we introduce various definitions
for the returns for the asset.
It is the profit rate of holding the asset from time t − 1 to t. Often we write
Rt = 100Rt %, as 100Rt is the percentage of the gain with respect to the initial
capital Pt−1 . This is particularly useful when the time unit is small (such as a day
or an hour); in such cases Rt typically takes very small values. The returns for less
2 Chapter 1 Asset Returns
risky assets such as bonds can be even smaller in a short period and are often quoted
in basis points , which is 10, 000Rt.
The one period gross return is defined as Pt /Pt−1 = Rt + 1. It is the ratio of the
new market value at the end of the holding period over the initial market value.
and the k-period gross return is Pt /Pt−k = Rt (k) + 1. It is easy to see that the
multiperiod returns may be expressed in terms of one-period returns as follows:
Pt Pt Pt−1 Pt−k+1
= ··· , (1.2)
Pt−k Pt−1 Pt−2 Pt−k
Pt
Rt (k) = − 1 = (Rt + 1)(Rt−1 + 1) · · · (Rt−k+1 + 1) − 1. (1.3)
Pt−k
If all one-period returns Rt , · · · , Rt−k+1 are small, (1.3) implies an approximation
This is a useful approximation when the time unit is small (such as a day, an hour
or a minute).
Note that a log return is the logarithm (with the natural base) of a gross return
and log Pt is called the log price. One immediate convenience in using log returns is
that the additivity in multiperiod log returns, i.e. the k period log return rt (k) ≡
log(Pt /Pt−k ) is the sum of the k one-period log returns:
where r̄ = (rt + rt−1 + · · · + rt−k+1 )/k is the average one-period log returns. In this
book returns refer to log returns unless specified otherwise.
Note that the identity (1.6) is in contrast with the approximation (1.4) which is
only valid when the time unit is small. Indeed when the values are small, the two
returns are approximately the same:
rt = log(1 + Rt ) ≈ Rt .
However, rt < Rt . Figure 1.1 plots the log returns against the simple returns for the
Apple Inc share prices in the period of January 1985 – February 2011. The returns
are calculated based on the daily close prices for the three holding periods: a day, a
week and a month. The figure shows that the two definitions result almost the same
daily returns, especially for those with the values between −0.2 and 0.2. However
when the holding period increases to a week or a month, the discrepancy between
the two definitions is more apparent with a simple return always greater than the
corresponding log return.
Figure 1.1 Plots of log returns against simple returns of the Apple Inc share prices in
January 1985 – February 2011. The blue straight lines mark the positions where the two
returns are identical.
The log return rt is also called continuously compounded return due to its close
link with the concept of compound rates or interest rates . For a bank deposit
account, the quoted interest rate often refers to as ‘simple interest ’. For example,
an interest rate of 5% payable every six months will be quoted as a simple interest
of 10% per annum in the market. However if an account with the initial capital $1
is held for 12 months and interest rate remains unchanged, it follows from (1.2) that
the gross return for the two periods is
1 × (1 + 0.05)2 = 1.1025,
i.e. the annual simple return is 1.1025 − 1 = 10.25%, which is called the compound
4 Chapter 1 Asset Returns
return and is greater than the quoted annual rate of 10%. This is due to the earning
from ‘interest-on-interest’ in the second six-month period.
Now suppose that the quoted simple interest rate per annum is r and is un-
changed, and the earnings are paid more frequently, say, m times per annum (at
the rate r/m each time of course). For example, the account holder is paid every
quarter when m = 4, every month when m = 12, and every day when m = 365.
Suppose m continues to increase, and the earnings are paid continuously eventually.
Then the gross return at the end of one year is
lim (1 + r/m)m = er .
m→∞
More generally, if the initial capital is C, invested in a bond that compounds con-
tinuously the interest at annual rate r, then the value of the investment at time t
is
C exp(rt).
Hence the log return per annum is r, which is the logarithm of the gross return.
This indicates that the simple annual interest rate r quoted in the market is in fact
the annual log return if the interest is compounded continuously. Note that if the
interest is only paid once at the end of the year, the simple return will be r, and the
log return will be log(1 + r) which is always smaller than r.
In summary, a simple annual interest rate quoted in the market has two interpre-
tations: it is the simple annual return if the interest is only paid once at the end of
the year, and it is the annual log return if the interest is compounded continuously.
The above definitions are based on the assumption that all dividends are cashed out
and are not re-invested in the asset.
1.1 Returns 5
Bt exp(Drt ) = $1,
i.e. the price is Bt = exp(−Drt ) dollars. Thus, the annualized log-return of the
bond is
Here, we ignore the fact that Bt+1 has one unit of time shorter maturity than Bt .
Suppose that we have two baskets of high-yield bonds and investment-grade
bonds (i.e. the bonds with relatively low risk of default) with an average duration of
4.4 years each. Their yields spread (i.e. the difference) over the Treasury bond with
similar maturity are quoted and plotted in Figure 1.2. The daily returns of bonds can
then be deduced from (1.7), which is the change of yields multiplied by the duration.
The daily changes of treasury bonds are typically much smaller. Hence, the changes
of yield spreads can directly be used as proxies of the changes of yields. As expected,
the high-yield bonds have higher yields than the investment grade bonds, but have
higher volatility too (about 3 times). The yield spreads widened significantly in a
period after the financial crisis following Lehman Brothers filing bankrupt protection
on September 15, 2008, reflecting higher default risks in corporate bonds.
6 Chapter 1 Asset Returns
Figure 1.2 Time series of the yield spreads (the top panel) of high-yield bonds (blue curve)
and investment-grade bonds (red curve), and their associated daily returns (the 2nd and
3rd panels) in November 29, 2004 – December 10, 2014.
Figure 1.3 Time series plots of the daily indices, the daily log returns, the weekly log
returns, and the monthly log returns of S&P 500 index in January 1985 – February 2011.
1.2 Behavior of financial return data 9
Figure 1.4 Time series plots of the daily prices, the daily log returns, the weekly log
returns, and the monthly log returns of the Apple stock in January 1985 – February 2011.
Figure 1.5 Histograms (the top panels) and Q-Q plots (the bottom panels) of the daily,
weekly, and monthly log returns of S&P 500 in January 1985 – February 2011. The normal
density with the same mean and variance are superimposed on the histogram plots.
periods are not normally distributed. Especially the tails of the return distributions
10 Chapter 1 Asset Returns
are heavier than those of the normal distribution, which is highlighted explicitly in
the Q-Q plots: the left tail (red circles) is below (negatively larger) the blue line,
and the right tail (red circles) is above (larger) the blue line. We have also noticed
that when the holding period increases from a day, a week to a month, the tails of
the distributions become lighter. In particular the upper tail of the distribution for
the monthly returns is about equally heavy as that of a normal distribution (red
circles and blue line are about the same). All the distributions are skewed to the left
due to a few large negative returns. The histograms also show that the distribution
for the monthly returns is closer to a normal distribution than those for the weekly
returns and the daily returns. The similar patterns are also observed in the Apple
return data; see Figure 1.6.
Figure 1.6 Histograms (the top panels) and Q-Q plots (the bottom panels) of the daily,
weekly, and monthly log returns of the Apple stock in January 1985 – February 2011. The
normal density with the same mean and variance are superimposed on the histogram plots.
Figures 1.7 and 1.8 plot the sample autocorrelation function (ACF) ρk against
the time lag k for the log returns, the squared log returns and the absolute log
returns. Given a return series r1 , · · · , rT , the sample autocorrelation function is
defined as ρk = γk /
γ0 , where
1.2 Behavior of financial return data 11
T −k T
1 1
k =
γ (rt − r̄)(rt+k − r̄), r̄ = rt . (1.8)
T t=1 T t=1
Figure 1.7 Autocorrelations of the daily, weekly, and monthly log returns, the squared
daily, weekly, and monthly log returns, and the absolute daily, weekly, and monthly log
returns of S&P 500 in January 1985 – February 2011.
12 Chapter 1 Asset Returns
Figure 1.8 Autocorrelations of the daily, weekly, and monthly log returns, the squared
daily, weekly, and monthly log returns, and the absolute daily, weekly, and monthly log
returns of the Apple stock in January 1985 – February 2011.
Furthermore the autocorrelations are more pronounced and more persistent in the
daily data than in weekly and monthly data. Since the correlation coefficient is
a measure of linear dependence, the above empirical evidence indicates that the
returns of a financial asset are linearly independent with each other, although there
exist nonlinear dependencies among the returns at different lags. Especially the daily
absolute returns exhibit significant and persistent autocorrelations — a characteristic
of so-called long memory processes.
The above findings from the two real data sets are in line with the so-called
stylized features in financial returns series, which are observed across different kinds
of assets including stocks, portfolios, bonds and currencies. See, e.g. Rydberg (2000).
We summarize these features below.
(i) Stationarity. The prices of an asset recorded over times are often not sta-
tionary due to, for example, the steady expansion of economy, the increase of pro-
ductivity resulting from technology innovation, and economic recessions or financial
crisis. However their returns, denoted by rt for t 1, typically fluctuates around a
1.2 Behavior of financial return data 13
constant level, suggesting a constant mean over time. See Figures 1.3 and 1.4. In
fact most return sequences can be modeled as a stochastic processes with at least
time-invariant first two moments (i.e. the weak stationarity; see 2.1). A simple
(and perhaps over-simplistic) approach is to assume that all the finite dimensional
distributions of a return sequence are time-invariant.
(ii) Heavy tails. The probability distribution of return rt often exhibits heavier
tails than those of a normal distribution. Figures 1.5 and 1.6 provide the quantile-
quantile plot or Q-Q plot for graphical checking of normality. See Section 1.5 for de-
tail. A frequently used statistic for checking the normality (including tail-heaviness)
is the Jarque-Bera test presented in Section 1.5. Nevertheless rt is assumed typically
to have at least two finite moments (i.e. E(rt2 ) < ∞), although it is debatable how
many moments actually exist for a given asset.
The density of t-distribution with degree of freedom ν is given by
−(ν+1)/2
−1 x2
fν (x) = dν 1+ , (1.9)
ν
√
where dν = B(0.5, 0.5ν) ν is the normalization constant and B is a beta function.
This distribution is often denoted as t(ν) or tν . Its tails are of polynomial order
fν (x) |x|−(ν+1) (as |x| → ∞), which are heavier than the normal density. Note
that for any random variable X ∼ t(ν), E{|X|ν } = ∞ and E{|X|ν−δ } < ∞ for any
δ ∈ (0, ν].
When ν is large, t(v) is close to a normal distribution. In fact, based on a sam-
ple of size 2500 (approximately 10-year daily data), one can not differentiate t(10)
from a normal distribution based on, for example, the Kolmogorov-Smirnov test
(the function KS.test in R). However their tail behaviors are very different: A
5-standard-deviation (SD) event occurs once in every 14000 years under a normal
distribution, once in every 15 years under t10 , and once in every 1.5 years under
t4.5 . The calculation goes as follows. The probability of getting a −5 SD daily shock
or worse under the normal distribution is 2.8665 × 10−7 (which is P (Z < −5) for
Z ∼ N (0, 1)), or 1 in 3488575 days. Dividing this by approximately 252 trading
days per year yields the result of 13844 years. A similar calculation can be done
with different kinds of t-distributions. If the tails of stock returns behave like t4.5 ,
the left tail of typical daily S&P 500 returns, the occurrence of −5 SD event is more
often than what we would conceive.
Figure 1.9 plots the quantiles of the S&P 500 returns in the period January 1985
– February 2011 against the quantiles of t(ν) distributions with ν = 2, 3, · · · , 7. It
is clear that the tails of the S&P 500 return are heavier than the tails of t(5), and
are thinner than the tails of t(2) (and perhaps also t(3)). Hence it is reasonable to
assume that the second moment of the return of the S&P 500 is finite while the 5th
14 Chapter 1 Asset Returns
Figure 1.9 Plots of log returns against simple returns of the Apple Inc share prices in
January 1985 – February 2011. The blue straight lines mark the positions where the two
returns are identical.
1.2 Behavior of financial return data 15
Figure 1.10 Time series plot of VIX (red) and the S&P 500 index (blue) in Nov. 29, 2004
– Dec. 14, 2011 (the left panel), and the plot of the daily S&P 500 returns (in percent)
against the changes of VIX (the right panel).
The efficient markets hypothesis (EMH) in finance assumes that asset prices
are fair, information is accessible for everybody and is assimilated rapidly to adjust
prices, and people (including traders) are rational. Therefore price Pt incorporates
all relevant information up to time t, individuals do not have comparative advantages
in the acquisition of information. The price change Pt −Pt−1 is only due to the arrival
of “news” between t and t + 1. Hence individuals have no opportunities for making
an investment with return greater than a fair payment for undertaking riskiness of
the asset. A shorthand for the EHM: the price is right, and there exist no arbitrage
opportunities.
The above describes the strong form of the EMH: security prices of traded assets
reflect instantly all available information, public or private. A semi-strong form
states that security prices reflect efficiently all public information, leaving rooms
for the value of private information. The weak form merely assumes security prices
reflect all past publicly available information.
Under the EHM, an asset return process may be expressed as
One of the most frequently used format for martingale difference innovations is of
the form
εt = σt ηt , (1.13)
Note that ARCH and GARCH processes are special cases of (1.13).
(iii) IID innovations : εt are independent and identically distributed, denoted as
εt ∼ IID(0, σ 2 ).
The assumption of IID innovations is the strongest. It implies that the innovations
are martingale differences. On the other hand, if {εt } satisfies (1.12), it holds that
18 Chapter 1 Asset Returns
Hence, {εt } is a white noise series. Therefore the relationship among the three types
of innovations is as follows:
Figure 1.11 Relationship among different processes: Stationary processes are the largest
set, followed by white noise, martingale difference (MD), and i.i.d. processes. There are
many useful processes between stationary processes and white noise processes, to be detailed
in Chapters 2 and 3.
The white noise assumption is widely observed in financial return data. It is con-
sistent with the stylized features presented in Figures 1.7 and 1.8. It is implied by the
EMH, as the existence of the non-zero correlation between εt+1 and its lagged values
leads to an improvement on the prediction for rt+1 over the rational expectation μ.
This violates the hypothesis that εt+1 is unpredictable at time t. To illustrate this
point, suppose Corr(εt+1 , εs ) = ρ
= 0 for some s t. Then r t+1 = μ + ρ(rs − μ) is a
legitimate predictor for rt+1 at time t. Under the EMH, the fair predictor for rt+1
at time t is rt+1 = μ. Then it is easy to see that
whereas
i.e. the mean squared predictive error of r t+1 is smaller. Hence the white noise
assumption is appropriate and arguably necessary under the EMH. It states merely
1.3 Efficient markets hypothesis and statistical models for returns 19
that the asset returns cannot be predicted by any linear rules. However it says
nothing beyond the first two moments and remains silent on the question whether
the traded asset returns can be predicted by nonlinear rules or by other complicated
strategies.
On the other hand, the empirical evidence reported in Section 1.2 indicates that
the IID assumption is too strong and too restrictive to be true in general. For
example, the squared and the absolute returns of both S&P 500 index and the Apply
stock exhibit significant serial correlations, indicating that r1 , r2 , · · · , therefore also
ε1 , ε2 , · · · , are not independent with each other; see Figures 1.7 & 1.8.
Note that rt = log(Pt /Pt−1 ). It follows from (1.11) that
Hence under the assumption that the innovations εt are IID, the log prices log Pt , t =
1, 2, · · · form a random walk , and the prices Pt , t = 0, 1, 2, · · · , are a geometric
random walk. Since the future is independent of the present and the past, the EMH
holds in the most strict sense and nothing in the future can be predicted based on
the available information up to the present. If we further assume εt to be normal,
Pt follows a log normal distribution. Then the price process Pt , t = 0, 1, 2, · · · , is a
log normal geometric random walk. As the length of time unit shrinks to zero, the
number of periods goes to infinity and the appropriately normalized random walk
log Pt converges to a Brownian motion, and the geometric random walk Pt converges
to a geometric Brownian motion under which the celebrated Black-Scholes formula
is derived. The concept that stock market prices evolve according to a random walk
can be traced back at least to French mathematician Louis Bachelier in his PhD
dissertation in 1900.
A weaker form of random walks relaxes εt to be, for example, martingale differ-
ences. The martingale difference assumption offers a middle ground between white
noise and IID. While retaining the white noise (i.e. the linear independence) prop-
erty, it does not rule out the possibility of some nonlinear dependence, i.e. {rt } are
uncorrelated but {rt2 } or {|rt |} may be dependent with each other. Under this as-
sumption model (1.11) may accommodate conditional heteroscadasticity as in (1.13).
In fact, many volatility models including ARCH, GARCH and stochastic volatility
models are special cases of (1.11) and (1.13) with εt being martingale differences.
The martingale difference assumption retains the hypothesis that the innovation
εt+1 is unpredictable at time t at least as far as the point prediction is concerned.
(Later we will learn that the interval predictions for εt+1 , or more precisely, the
risks εt+1 may be better predicted by incorporating the information from its lagged
values.) The best point predictor for rt+1 based on rt , rt−1 , · · · is the conditional
20 Chapter 1 Asset Returns
expectation
which is the fair expectation of rt+1 under the EMH. The last equality in the above
expression is guaranteed by (1.12). We call rt+1 as the best in the sense that it
minimizes the mean squared predictive errors among all the point predictors based
on rt , rt−1 , · · · . See Section 2.9.1 for additional details.
In summary, the martingale hypothesis, which postulates model (1.11) with mar-
tingale difference sequence {εt }, assures that the returns of assets cannot be pre-
dicted by any rules, but allow volatility to be predictable. It is the most appropriate
mathematical form of the efficient market hypothesis.
From the discussion in the previous section, we have learned that if returns
are unpredictable, they should be at least white noise. On the other hand, the
assumption that returns are IID is obviously too strong. The autocorrelations in
squared and absolute returns shown in Figures 1.7 & 1.8 clearly indicate that returns
at different times are not independent of each other. In spite of a large body of
statistical tests for IID (e.g. see the rank-based test of Hallin and Puri (1988),
and also see section 2.2 of Campbell et al. (1997), we focus on testing white noise
hypothesis, i.e. returns are linearly independent but may depend on each other in
some nonlinear manners. The test for white noise is one of the oldest and the most
important tests in statistics, as many testing problems in linear modelling may be
transformed into a white noise test. There exist quite a few testing methods; see
section 7.4 of Fan and Yao (2003) and the references within. We introduce below a
simple and frequently used omnibus test, i.e. Ljung-Box portmanteau test .
The linear dependence between rt and rt−k is comprehensively depicted by the
correlation function between rt and rt−k :
cov(rt , rt−k )
ρk ≡ Corr(rt , rt−k ) =
.
var(rt )var(rt−k )
1.4 Tests related to efficient markets hypothesis 21
The Ljung-Box portmanteau test: Reject the hypothesis that {rt } is a white
noise at the significance level α if Qm > χ2α,m or its P-value, computed as P (Q > Qm )
with Q ∼ χ2m , is smaller than α.
Table 1.1 P-values based on the Ljung-Box test for the S&P 500 data
m 1 6 12 24
returns Qm 2.101 5.149 8.958 14.080
P-value 0.147 0.525 0.707 0.945
squared returns Qm 5.517 12.292 16.964 23.474
P-value 0.019 0.056 0.151 0.492
absolute returns Qm 8.687 39.283 49.721 76.446
P-value 0.003 0.000 0.000 0.000
1
m
√ T ρ(j)2 − m .
2m j=1
The asymptotic normality of this statistic under the condition that m → ∞ and
m/T → 0 has been established for IID data by Hong (1996), for martingale dif-
ferences by Hong and Lee (2003) (see also Durlauf (1991) and Deo (2000)), and
for other non-IID white noise processes by Shao (2011), and Xiao and Wu (2011).
However, those convergences are typically slow or very slow, resulting in the size
distortation of the tests based on the asymptotic normality. In addition, as pointed
out above, when j is large, ρj tends to be small. Therefore, including those terms ρ̂2j
adds noises to the test statistic without increasing signals. How to choose a relevant
m adds a further complication in using this approach. Horowitz et al.(2006) pro-
posed a double blockwise bootstrap method to perform the tests with the statistic
Q∗m for non-IID white noise.
Now, it is easy to see that Qm is approximately the same as Q∗m when T is large,
since (T + 2)/(T − j) ≈ 1. Hence, it also follows χ2m -distribution under the null
hypothesis. In fact Q∗m is the test statistic proposed by Box and Pierce (1970).
However Ljung and Box (1978) subsequently discovered that the χ2 -approximation
to the distribution of Q∗m is not always adequate even for T as large as 100. They
suggest to use the statistic Qm instead as its distribution is closer to χ2m . See also
Davies, Triggs and Newbold (1977).
1.4 Tests related to efficient markets hypothesis 23
A more fundamental problem in applying the Ljung-Box test is that the statistic
itself is defined to detecting the departure from white noise, but the asymptotic
χ2 -distribution can only be justified under the IID assumption. Therefore, as for-
mulated above, it should not be used to test the hypothesis that the returns are
white noise but not IID, as then the asymptotic null distributions of ρ(k) depend on
the high moments of the underlying distribution of rt . These asymptotic null dis-
tributions may typically be too complicated to be directly useful in the sense that
the asymptotic null distributions of Qm or Q∗m may then not be of the known forms
for fixed m; see, e.g. Romano and Thombs (1996). Unfortunately this problem also
applies to most (if not all) other omnibus white noise tests.
One alternative is to impose an explicit assumption on the structure of white
noise process (such as a GARCH structure), then some resampling methods may be
employed to simulate the null distribution of Qm . Furthermore if one is also willing
to impose some assumptions on the parametric form of a possible departure from
white noise, a likelihood ratio test can be employed, which is often more powerful
than a omnibus nonparametric test, as the latter tries to detect the departure (from
white noise) to all different possibilities. The analogy is that an all-purpose tool is
typically less powerful on a particular task than a customized tool. An example of
the customized tool is the Dicky-Fuller test in the next section. However it itself
is a challenge to find relevant assumptions. This is why omnibus tests such as the
Ljung-Box test are often used in practice, in spite of their potential problems in
mispecifying significance levels.
Another way to test the EMH is to look at the random walk model (1.14) for
log prices Xt ≡ log Pt . In general we may impose an autoregressive model for the
log prices:
Xt = μ + αXt−1 + εt . (1.16)
Xt = αXt−1 + εt , (1.17)
and (iii) the model with both drift and a linear trend
Xt = μ + βt + αXt−1 + εt . (1.18)
24 Chapter 1 Asset Returns
α − 1)/SE(
W = ( α). (1.19)
where
T T
1 1
X̄T = Xt , X̄T −1 = Xt−1 .
T − 1 t=2 T − 1 t=2
be the least squares estimator for μ in (1.16). Then
Furthermore let μ
1
T
T
2
SE(
α) = (Xt − μ Xt−1 )2
−α (Xt−1 − X̄T −1 )2 .
T − 3 t=2 t=2
There also exists the Dickey-Fuller coefficient test, which is based on the test statistic
T (α̂ − 1). The asymptotic null distributions are complicated, but can be tabulated.
1.4 Tests related to efficient markets hypothesis 25
At significant level α = 0.05, the critical values are −8.347 and −13.96 respectively
for testing model (1.17) (without drift) and model (1.16) (with drift).
Although the Dickey-Fuller statistic is of the form of a t-statistic (see (1.19)),
t-distributions cannot be used for this test as all the three models under H0 are
nonstationary (see section 2.1 below). In fact the Dickey-Fuller test statistic admits
certain non-standard asymptotic null distributions and those distributions under
models (1.16) – (1.18) are different from each other. Fortunately the quantiles or
critical values of those distributions have been tabulated in many places; see, e.g.
Fuller (1996). Table 1.2 lists the most frequently used critical values, evaluated by
simulation with the sample size T = 100. Larger sample sizes will result in critical
values that are slightly smaller in absolute value and smaller sample sizes will result
in somewhat larger critical values.
The R-code “aDF.test.r” defines a function aDF.test which implements the (aug-
mented) Dickey-Fuller test: aDF.test(x, kind=i, k=0), where x is a data vector,
and i should be set at 2 for model (1.16), 1 for model (1.17), and 3 for model (1.18).
We now apply the Dickey-Fuller test to the log daily, weekly, and monthly prices
displayed in Figure 1.3. Since the returns (i.e. the differenced log prices) fluctuate
around 0 and show no linear trend, we tend to carry out the test based on either
model (1.16) or model (1.17). But for the illustration purpose, we also report the
tests based on model (1.18). The P-values of the test with the three models for the
daily, weekly, and monthly prices are listed below.
Since none of those tests are statistically significant, we cannot reject the hypothesis
that the log prices for the S&P 500 are random walk. This applies to daily, weekly,
and also monthly data. We also repeat the above exercise for the daily, weekly
and monthly returns (i.e. the differenced log prices), obtaining the P-values smaller
than 0.01 for all the cases. This shows that the returns are not random walks across
difference frequencies.
26 Chapter 1 Asset Returns
The Dickey-Fuller test was originally proposed in Dickey and Fuller (1979). It has
been further adapted in handling the situations when there are some autoregressive
terms in models(1.16) – (1.18); see section 2.8.2 below.
Figure 1.12 In terms of returns, the null hypotheses of both Ljung-Box and Dickey-Fuller
tests are the same. However, the alternative of Ljung-Box is larger.
is the sample kurtosis. Therefore, the JB-statistic validates really only the skewness
and kurtosis of normal distributions.
Under the null hypothesis that the data are drawn independently from a normal
distribution, the asymptotic null distribution of the JB-statistic follows approxi-
mately χ22 -distribution. Therefore, the P-value can easily be computed by using
χ22 -distribution.
It is our hope that readers will be stimulated to use the methods described in
this book for their own applications and research. Our aim is to provide information
in sufficient detail so that readers can produce their own implementations. This will
be a valuable exercise for students and readers who are new to the area. To assist
this endeavor, we have placed all of the data sets and codes used in this book on the
following web site:
http://orfe.princeton.edu/~jqfan/fan/FinEcon.html
1.7 Exercises
1.1 Download the daily, weekly and monthly prices for the Nasdaq index and the IBM
stock from Yahoo!Finance. Reproduce Figures 1.3 – 1.8 using the Nasdaq index and
the IBM stock data instead.
1.2 Consider a path dependent payoff function Yt = a1 rt+1 + · · · + ak rt+k where {ai }ki=1
are given weights. If the return time series is weak stationary in the sense that
cov(rt , rt+j ) = γ(j). Show that
k
k
var(Yt ) = ai aj γi−j .
i=1 j=1
k
k
var(Y
ˆ t) = ai aj γ̂i−j ,
i=1 j=1
ˆ t ) 0.
where γ̂i−j is defined by (1.8). Show that var(Y
1.3 Consider the following quote from Eugene Fama who was Myron Scholes’ thesis adviser:
If the population of prices changes is strictly normal, on the average for any stock · · ·
an observation more than five standard deviations from the mean should be observed
about once every 7000 years. In fact such observations seem to occur about once every
three or four years.
(See Lowenstein, 2001, page 71.) For X ∼ N (μ, σ 2 ), P (|X − μ| > 5σ) = 5.733 × 10−7 ,
deduce how many observations per year Fama was implicitly assuming to be made. If
a year is defined as 252 trading days and daily returns are normal, how many years is
it expected to take to get a 5 standard deviation event? How does the answer to the
last question change when the daily returns follow the t-distribution with 4 degrees of
freedom.
1.4 Is the (marginal) distribution of log-returns over a long time horizon (e.g. monthly or
quarterly) close to normal? Explain briefly.
1.5 Generate a random sample of size 1000 from the t-distribution with ν degrees of freedom
and another random sample of size 1000 from the standard normal distribution. Apply
the Kolmogorov-Smirnov test to check if they come from the same distribution. Report
the results for ν = 5, 10, 15 and 20.
30 Chapter 1 Asset Returns
1.6 Report the P-values for appying the Jarque-Bera test to the data given in Exercise 1.1.
What can you conclude based on these P-values?
1.7 Generate a random sample of size 100 from the t-distribution with ν degrees of freedom
for ν = 5, 15 and ∞ (i.e. normal distribution). Apply the Jarque-Bera test to check
the normality and report the P-values.
1.8 According to the efficient market hypothesis, is the return of a portfolio predictable?
Is the volatility of a portfolio predictable? State the most appropriate mathematical
form of the efficient market hypothesis.
1.9 If the Ljung-Box test is employed to test the efficient market hypothesis, what null
hypothesis is to be tested? If the autocorrelation for the first 4 lags of the monthly
log-returns of the S&P 500 is
Data obtained from observations collected sequentially over time are common in
this information age. For example, we have collections on daily stock prices, weekly
interest rates, monthly sales figures, quarterly consumer price indices (CPI) and
annual gross domestic product (GDP) figures. Those data collected over time are
called time series. The purpose of analyzing time series data is in general two-fold:
to understand the stochastic mechanism that generates the data, and to predict
or forecast the future values of a time series. This chapter introduces a class of
linear time series models, or more precisely, a class of models which depict the
linear features (including linear dependence) of time series. Those linear models
and associated inference techniques provide the basic framework for the study of the
linear dynamic structure of financial time series and for forecasting future values
based on linear dependence structures.
2.1 Stationarity
One of the important aspects of time series analysis is to use the data collected
in the past to forecast the future. How can historical data be useful for forecasting
a future event? This is through the assumption of stationarity which refers to some
time-invariance properties of the underlying process. For example, we may assume
that the correlation between the returns of tomorrow and today is the same as
those between any two successive days in the past. This enables us to aggregate the
information from the data in the past to learn about the correlation. This correlation
invariance over time is a typical characteristic of the so called weak stationarity or
covariance stationarity . It facilitates linear prediction which is essentially based
on the correlation between a predicated variable and its predictor (such as in linear
regression). A stronger time-invariant assumption is that the joint distribution of the
returns in a week in the future is the same as that in any weeks in the past. In other
words, prediction is always based on some invariance properties over time, although
the invariance may refer to some characteristics of the probability distribution of
the process, or to the law governing the change of the distribution. We introduce
the concept of stationarity more formally below.
32 Chapter 2 Linear Time Series Models
A time series {Xt } is said to be weakly stationary (or second order stationary or
covariance stationary ) if E(Xt2 ) < ∞ and both EXt and cov(Xt , Xt+k ), for any
integer k, do not depend on t.
For weakly stationary time series {Xt }, let μ = EXt denote its common mean.
We define the autocovariance function (ACVF) as
for k = 0, ±1, ±2, · · · . Note that γ(0) = var(Xt ) is independent of t. For simplicity,
we drop the adverb “weakly” and call {Xt } stationary if it is weakly stationary, i.e.
{Xt } has finite and time-invariant first two moments. It is easy to see that ρ(0) = 1
and ρ(k) = ρ(−k) for any stationary processes, and that the variance-covariance
matrix of the vector (Xt , · · · , Xt+k ) is
⎛ γ(0) γ(1) γ(2) · · · γ(k − 1) ⎞
⎜ γ(1) γ(0) γ(1) ··· γ(k − 2) ⎟
⎜ ⎟
var(Xt , · · · , Xt+k ) = ⎜
⎜
..
.
..
.
..
.
..
.
⎟.
⎟
⎝ ⎠
γ(k − 2) γ(k − 3) γ(k − 4) · · · γ(1)
γ(k − 1) γ(k − 2) γ(k − 3) · · · γ(0)
Therefore, for any linear combinations,
k k
k
var( ai Xt+i ) = ai aj cov(Xt+i , Xt+k )
i=1 i=1 j=1
k
k
= ai aj γ(i − j) 0. (2.3)
i=1 j=1
2.2 Stationary ARMA models 33
where X̄ = T −1 1tT Xt . In the estimator γ (k), we use the divisor T instead
of T − k. This is a common practice adopted by almost all statistical packages.
It ensures that the function γ̂(·) is semi-positive definite (Exercise 2.2), a property
given by (2.3). See Fan and Yao (2003) pp.42 for further discussion on this choice.
Weak stationarity is indeed a very weak notion of stationarity. For example, if
{Xt } is weakly stationary, it does not imply that {Xt2 } is weakly stationary. Yet, the
latter time series has very strong connections with the volatility of financial returns.
Therefore, we need a stronger version of stationarity as follows.
One of the most frequently used time series models is the stationary autoregres-
sive moving average (ARMA) model. It is frequently used in modeling the dynamics
on returns of financial assets and other time series.
34 Chapter 2 Linear Time Series Models
Perhaps the simplest stationary time series are moving average (MA) processes.
They also facilitate the computation of the autocovariance function easily. A simple
example of this is the k-period return (1.6).
Let εt ∼ WN(0, σ 2 ). For a fixed integer q 1, we write Xt ∼ MA(q) if Xt is
defined as a moving average of q successive εt as follows:
Similarly,
since there is only one common term, εt−1 in both Xt and Xt−1 . Now for the ACVF
of lag two, we have
Xt = μ + εt + aεt−1 ,