Bootstrap Event Study Tests: Peter Westfall ISQS Dept

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

Bootstrap Event Study Tests

Peter Westfall
ISQS Dept.
Joint work with Scott Hein, Finance
An Example of an Event
Daily DJIA Returns
-8
-6
-4
-2
0
2
4
29-Apr-01 18-Jun-01 7-Aug-01 26-Sep-01
Date
%

C
h
a
n
g
e
Event (Outlier) Detection
Main Idea: y
0
is an outlier if it is unusual
with respect to typical circumstances.
Definitions:
Critical value: The threshold c that y
0
must
exceed to be called an outlier
o level: The probability that Y
0
exceeds c
under typical circumstances
p-value: The probability that Y
0
exceeds the
particular observed value y
0
under typical
circumstances
Case 1: Normal distribution, known
mean (), known variance (o
2
).
Let
0
( )
.
Y
Z

=
Y
0
is associated with an event if Z is large.
Critical and p-values are from Z distribution.

Ex: y
0
= -7.13, =-.15, o=1.0 Z=-6.98.
o=.05 critical value: Z
o/2
= 1.96.
p-value = 2P(Z<-6.98) = 3E-12
Case 2: Normal distribution,
unknown , known o
2
.
Let Y
1
,,Y
n
denote an i.i.d. sample under
typical circumstances (excluding Y
0
). Then


2
2 2
0
0
1
( ) 1
1
1
Var Y Y
n n
Y Y
Z
n
o
o o
o
| |
= + = +
|
\ .

=
+
Case 3: Normal distribution,
unknown , unknown o
2
.
Let Y
1
,,Y
n
denote an i.i.d. sample under typical
circumstances (excluding Y
0
). Then


2
0
1
1
, where ( )
1
1
1
n
i
i
Y Y
T s Y Y
n
s
n
=

= =

Critical and p-values are from t


n-1
distribution.

Example: n=87, y
0
= -7.13, =-.14, s=1.013 T=-6.86.
o=.05 critical value: t
87-1,o/2
= 1.99.
p-value = 2P(T
87
<-6.86) =1E-9
Y
0 1
1
0
0, for 0
, where 0,1, 2,..., , and where
1, for 0
Matrix form of model:
1 0
, where ,
1 0
1 1
Regression Method for Event (outlier) detection
i i i i
n
i
Y X i n X
i
Y
Y X Y X
Y
Y
| | c
| c
=

= + + = =

(
(
(
= + = =
(
(

1
0
1
0
1
1
0 0
1 0
0
0 0
, , .

Least Squares Estimates: ( ' ) '


1 1 1 1
1
.
1 1 1 1
n
n n
i i
i i
X X X Y
Y Y Y Y n Y
n n Y Y
Y Y
c
|
| c
c |
c
|

= =
( (
( (
(
( (
= =
(
( (

( (

=
( (
+ + + (
( (
( (
= = =
(
( (
( (
+


( (




0 1
2 2 1
1 2 0
2 2
1
Goal: Test : 0.

Assuming , , , , are i.i.d. (0, ), then ( ) ( ' )


1 1
1 1

, implying that ( ) 1 .
1 1
Further,
Regression-Based Test for Event (outlier)
n
H
N Cov X X
Var
n n n
|
c c c c o | o
o | o

=
=

(
| |
= = +
| (
+
\ .

2
0
2 2
0 0
1
2 2
1
1

( )
( 1) 2
1
{ ( 0)} ( ( ))}
( 1) 2
1
( ) .
( 1)
Thus, the regression t-statistic is identical, and the degrees of freedom are
identical (n-1), t
n
i i
i
n
i
i
n
i
i
MSE Y Y
n
Y Y Y Y Y Y
n
Y Y s
n
=
=
=
=
+
| |
= + + { +
|
+
\ .
= =

o the case of testing with normal distribution, unknown


mean, and unknown variance (case 3).
Notes
The method is essentially asking, how far
into the tail of the typical distribution is y
0
?

(Estimation of the mean just gives a minor correction: (1+ 1/n) in the
variance formula;
Estimation of the variance gives another minor correction: T
n-1
instead
of Z critical and p-values)

The central limit theorem does not apply
since we are concerned with the distribution
of Y
0
, not the distribution of
. Y
The Distribution of (Y
0
-)/o
Uniform Distribution:
Lower(.05/2)=-1.645,
Upper(.05/2)=1.645
0
0.5
1
-3 -2 -1 0 1 2 3
Exponential distribution:
Lower(.05/2)=-0.97,
Upper(.05/2)=2.67
0
0.5
1
-3 -2 -1 0 1 2 3
Case 1A: Known Distribution
Exact critical values for Z are
c
L
= {o/2 quantile of distribution of Z}
c
U
= {1-o/2 quantile of distribution of Z}

Exact P-Value:
p-value = 2 min{ P(Z

s z), P(Z

> z) }

A Simulation-Based Approach
Simulate many (1,000s) of Zs at random
from the pdf
Critical values:
c
L
is the 100(o/2) percentile of the simulated data
c
U
is the 100(1-o/2) percentile of the simulated data
P-value:
p
L
= {proportion of simulated Zs that are smaller
than z.
p
U
= proportion of simulated Zs that are larger than
z.
P-value = 2{min(p
L
, p
U
)}.
Case 1B: Unknown Distribution
Let Y
1
,,Y
n
denote an i.i.d. sample under
typical circumstances (excluding Y
0
). Then
the empirical pdf approximates the true pdf
if n is large (Glivenko-Cantelli Theorem).
Thus, approximate critical and p-values
can be obtained by using the empirical
distribution.
This is the essential nature of the
bootstrap.
Case 1B.i: Simulation-Based
Approach with known , o
Simulate 1000s of values of Z = (Y
0
)/o as
follows:
1. Select a value Y
01
at random from the observed
data Y
1
,,Y
n
; let Z
1
= (Y
01
)/o
2. Select a value Y
02
at random from the observed
data Y
1
,,Y
n
; let Z
2
= (Y
02
)/o

B. Select a value Y
0B
at random from the observed data
Y
1
,,Y
n
; let Z
B
= (Y
0B
)/o
Use the simulated data Z
1
,,Z
B
to determine critical
and p-values.
Case 1B.ii: Unknown , o

Use the statistic



The distribution of the statistic depends on the
randomness inherent in


2
0
1
1
, where ( )
1
n
i
i
Y Y
T s Y Y
s n
=

= =

and Y s
Case 1B.ii: Simulation-Based Approach
0
11 1 01 1
1 01 1 1 1 1 11 1
Simulate 1000's of values of ( ) / as follows:
1. Select a sample , , , at random from the observed data , , ;
let ( ) / , where , are computed from , , .
2. Select a
n n
n
T Y Y s
Y Y Y Y Y
T Y Y s Y s Y Y
=
=
12 2 02 1
2 02 2 2 2 2 12 2
1 0 1
sample , , , at random from the observed data , , ;
let ( ) / , where , are computed from , , .
. Select a sample , , , at random from the observed data , ,
n n
n
B nB B n
Y Y Y Y Y
T Y Y s Y s Y Y
B Y Y Y Y Y
=
0 1
1
;
let ( ) / , where , are computed from , , .
Use the simulated data , , to determine critical and p-values.
B B B B B B B nB
B
T Y Y s Y s Y Y
T T
=
.
0 1
0 1
Let , ,..., , be the sample residuals from the regression model
0, for 0
, where 0,1, 2,..., , and where
1, for 0
(i.e.,
Regression Method for Event (outlier) detection
n
i i i i
e e e
i
Y X i n X
i
| | c
=

= + + = =

0 1
11 1 01 1
1 0 1
12 2 02

( ).)
1. Select , , , at random from the observed residuals , , ;
let be the regression test for : 0 using these resampled data.
2. Select , , , at random from t
i i i
n n
n
e Y X
Y Y Y e e
T H
Y Y Y
| |
|
= +
=
1
2 0 1
1 0 1
he observed residuals , , ;
let be the regression test for : 0 using these resampled data.
. Select , , , at random from the observed residuals , , ;
let be the regression t
n
B nB B n
B
e e
T H
B Y Y Y e e
T
| =
0 1
1
est for : 0 using these resampled data.
Use the simulated data , , to determine critical and p-values.
B
H
T T
| =
.
Extension: Market Model
0 1
0 1 2
Let , ,..., , be the sample residuals from the regression model
0, for 0
, where 0,1, 2,..., , where
1, for 0
and where is a market measure at time .
1. Select
n
i i i i i
i
e e e
i
Y X M i n X
i
M i
| | | c
=

= + + + = =

11 1 01 1
1 0 1
12 2 02 1
, , , at random from the observed residuals , , ;
let be the regression test for : 0 using these resampled data.
2. Select , , , at random from the observed residuals , ,
n n
n n
Y Y Y e e
T H
Y Y Y e e
| =
2 0 1
1 0 1
0 1
;
let be the regression test for : 0 using these resampled data.
. Select , , , at random from the observed residuals , , ;
let be the regression test for : 0 using these r
B nB B n
B
T H
B Y Y Y e e
T H
|
|
=
=
1
esampled data.
Use the simulated data , , to determine critical and p-values.
B
T T .
Extension: Multivariate Market Model
The MVRM models may be expressed as
R
i
= X|
i
+ D
i
+ c
i
, for i= 1,,g (firms or portfolios).
Observations within a row of c = [c
1
| | c
g
] are correlated; this is
called cross-sectional correlation.
Observations on c = [c
1
| | c
g
] between rows 1,,n are assumed to
be independent in the classical MVRM model.

Null hypothesis: H
0
: [
1
| |
g
] = [0 | | 0]

This multivariate test is computed easily and automatically using
standard statistical software packages, using exact (under normality)
F-tests. The test is based on Wilks Lambda likelihood ratio criterion.
Hein, Westfall, Zhang Bootstrap Method
1. Fit the MVRM model. Obtain the F-statistic for testing H
0
using
the traditional method (assuming normality). Obtain also the
((n+1)g) sample residual matrix e = [e
1
| | e
g
].
2. Exclude the row corresponding to event from e, leaving the
(ng) matrix e
-
.
3. Sample (n+1) row vectors, one at a time and with replacement,
from e
-
. This gives a ((n+1)g) matrix [ R
1
* | | R
g
* ].
4. Fit the model R
i
* = X|
i
+ D
i
+ c
i
, i = 1, , g, and obtain the test
statistic F* using the same technique used to obtain the F-
statistic from the original sample.
5. Repeat 3 and 4 NBOOT times. The bootstrap p-value of the
test is the proportion of the NBOOT samples yielding an F*-
statistic that is greater than or equal to the original F-statistic
from step 1.




Figure 1: True type I error rates for bootstrap and traditional tests for events when T=200.
Simulation Study: True Type I error rates

Traditional, o=.10, T=50
0
0.05
0.1
0.15
0.2
1 2 4 8
Number of Firms (or Portfolios)
T
y
p
e

I

e
r
r
o
r
T1
T2
T4
T8
Z

Traditional, o=.05, T=50
0
0.05
0.1
0.15
0.2
1 2 4 8
Number of Firms (or Portfolios)
T
y
p
e

I

e
r
r
o
r
T1
T2
T4
T8
Z

Traditional, o=.01, T=50
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
1 2 4 8
Number of Firms (or Portfolios)
T
y
p
e

I

e
r
r
o
r T1
T2
T4
T8
Z

Bootstrap, o=.01, T=50
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
1 2 4 8
Number of Firms (or Portfolios)
T
y
p
e

I

e
r
r
o
r
T1
T2
T4
T8
Z

Bootstrap, o=.05, T=50
0
0.05
0.1
0.15
0.2
1 2 4 8
Number of Firms (or Portfolios)
T
y
p
e

I

e
r
r
o
r
T1
T2
T4
T8
Z

Bootstrap, o=.10, T=50
0
0.05
0.1
0.15
0.2
1 2 4 8
Number of Firms (or Portfolios)
T
y
p
e

I

e
r
r
o
r T1
T2
T4
T8
Z

Figure 2: True type I error rates for bootstrap and traditional tests for events when T=50.

Simulation Study: True Type I error rates
Alternative Method (Kramer,2001)
Test statistic is Z = E t
i
/(g
1/2
s
t
), where t
i
is the t-statistic from the
univariate dummy-variable-based regression model for firm i, and s
t
is
the sample standard deviation of the g t-statistics.

Algorithm:
(i) create a pseudo-population of t-statistics t
i
* = t
i
- reflecting the null
hypothesis case where the true mean of the t-statistics is zero,

(ii) sample g values with replacement from the pseudo-population and
compute Z* from these pseudo-values,

(iii) repeat (ii) NBOOT times, obtaining Z
1
*, , Z
b
*. The p-value for
the test is then 2*min(p
U
, p
L
), where p
L
is the proportion of the NBOOT
bootstrap samples yielding Z
i
* s Z, and where p
U
is the proportion of
the NBOOT samples yielding Z
i
* > Z.

Assumption: The statistics are cross-sectionally independent
t
Modified Kramer Method
Model-Based bootstrap Kramer: Bootstrap
Kramers Z = E t
i
/(g
1/2
s
t
), but by resampling
MVRM residual vectors as in HWZ.
Model-based sum t: Bootstrap S
t
= Et
i
by
resampling MVRM residual vectors as in
HWZ.
Table 1. Simulated Type I error rates as a function of cross-sectional correlation.
Panel B: g = 30
=0 =0.1 =0.2 =0.3 =0.4 =0.5 =0.6 =0.7 =0.8 =0.9
HWZ 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
BT 0.070 0.057 0.056 0.059 0.062 0.064 0.051 0.058 0.056 0.049
BK 0.073 0.057 0.051 0.055 0.056 0.053 0.055 0.058 0.053 0.050
K 0.056 0.366 0.516 0.590 0.652 0.702 0.718 0.813 0.851 0.885

Panel A: g = 5
=0 =0.1 =0.2 =0.3 =0.4 =0.5 =0.6 =0.7 =0.8 =0.9
HWZ 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059
BT 0.055 0.057 0.055 0.057 0.056 0.058 0.056 0.057 0.060 0.056
BK 0.053 0.056 0.049 0.053 0.050 0.042 0.041 0.045 0.050 0.049
K 0.057 0.078 0.113 0.168 0.220 0.275 0.335 0.418 0.500 0.624

Table 2. Simulated power as a function of event effect o.

Panel A: g = 5

o=0 o=0.2 o=0.5 o=0.75 o=1.0 o=1.5 o=2.0 o=3.0 o=4.0
HWZ 0.05 0.04 0.09 0.19 0.28 0.74 0.95 1 1
BT 0.02 0.09 0.2 0.42 0.6 0.92 0.99 1 1
BK 0.05 0.07 0.17 0.34 0.58 0.78 0.92 0.97 1
K 0.06 0.08 0.16 0.27 0.42 0.65 0.75 0.88 0.93

Panel B: g = 30
o=0 o=0.1 o=0.2 o=0.3 o=0.4 o=0.5 o=0.6 o=0.7 o=0.8 o=0.9 o=1.0
HWZ 0.002 0.002 0.003 0.007 0.011 0.013 0.042 0.072 0.130 0.197 0.291
BT 0.070 0.092 0.206 0.363 0.573 0.751 0.871 0.949 0.988 0.999 1.000
BK 0.073 0.090 0.193 0.335 0.539 0.697 0.844 0.930 0.975 0.992 1.000
K 0.056 0.087 0.185 0.345 0.535 0.703 0.853 0.933 0.977 0.996 0.999

Table 3. Simulated Type I error rates as a function of serial correlation.
Panel A: Zero Cross-sectional correlation


|=0 |=0.1 |=0.2 |=0.3 |=0.4 |=0.5 |=0.6 |=0.7 |=0.8 |=0.9
HWZ 0.052 0.054 0.056 0.057 0.057 0.059 0.072 0.085 0.113 0.176
BT 0.062 0.062 0.067 0.066 0.068 0.07 0.067 0.071 0.079 0.093
BK 0.062 0.06 0.064 0.065 0.067 0.064 0.062 0.067 0.067 0.061
K 0.053 0.052 0.047 0.048 0.045 0.048 0.046 0.046 0.052 0.045

Panel B: Cross-sectional correlation=0.5.

|=0 |=0.1 |=0.2 |=0.3 |=0.4 |=0.5 |=0.6 |=0.7 |=0.8 |=0.9
HWZ 0.059 0.054 0.056 0.057 0.057 0.059 0.072 0.085 0.113 0.176
BT 0.058 0.045 0.047 0.048 0.050 0.054 0.054 0.056 0.064 0.075
BK 0.042 0.048 0.052 0.049 0.045 0.047 0.049 0.048 0.051 0.045
K 0.275 0.246 0.245 0.245 0.251 0.251 0.254 0.246 0.251 0.238

/*--------------------------------------------------------------*/
/* Name: bootevnt */
/* Title: Macro to calculate bootstrap p-values for event */
/* studies */
/* Author: Peter H. Westfall, [email protected] */
/* Release: SAS Version 6.12 or higher, requires SAS/IML */
/*--------------------------------------------------------------*/
/* Inputs: */
/* */
/* DATASET = Data set to be analyzed (required) */
/* */
/* YVARS = List of y variables used in the multivariate */
/* regression model, separated by blanks (required) */
/* */
/* XVARS = List of x variables used in the multivariate */
/* regression model, separated by blanks (required) */
/* */
/* EVENT = Name of dummy variable indicating event */
/* observation (e.g., day). This is required. */
/* */
/* EXCLUDE = Name of dummy variable indicating days that */
/* should be excluded from the resampling. If there */
/* are multiple event days in the model, then all */
/* those days should be excluded because the */
/* residuals are mathematically zero. If there are */
/* not multiple eventdays, then the EXCLUDE */
/* variable should be identical to the EVENT */
/* variable. */
/* */
/* NBOOT = Number of bootstrap samples. This input is */
/* required. Pick a number as large as possible */
/* subject to time constraints. Start with 100 */
/* and work your way up, noting the accuracy as */
/* given by the confidence interval in the output. */
/* */
/* MODELBOOT = 1 for requesting model-based bootstrap tests, */
/* = 0 to exclude them. */
/* */
/* NPBOOT = 1 to request Kramer's nonparametric bootstrap */
/* tests, =0 to exclude them. */
/* */
/* SEED = Seed value for random numbers (0 default) */
/* */
/*--------------------------------------------------------------*/
/* Output: This macro computes normality-assuming exact p- */
/* values and bootstrap approximate p-values that do not */
/* require the normality assumption. A 95% confidence interval */
/* for the true bootstrap p-value (which itself is approximate */
/* because it uses the empirical, not the true, residual */
/* distribution) also is given. */
/*--------------------------------------------------------------*/
Invocation of Macro
libname fin "c:\research\coba";
data sinkey;
set fin.sinkey;
run;

%bootevnt(dataset=sinkey, yvars=pr1 pr2 pr3 pr4,
xvars=ds m1 m2 m3 dsm d2 d3 d4 d5 d6, event=d1,
exclude=exclude, nboot=1000, modelboot=1,
npboot=1, seed=182161);
Normality-Assuming Tests for Event


TSQ F NDF DDF PVAL

15.025505 3.6957895 4 183 0.0064153


NBOOT

Model-based bootstrap Binder p-value, using 20000 samples


with 95% confidence limits on the true bootstrap p-value


BOOTP LCL UCL

0.01115 0.0096947 0.0126053

Model-based bootstrap Kramer p-value, using 20000 samples


with 95% confidence limits on the true bootstrap p-value


BOOTKP LCLK UCLK

0.0609 0.0561373 0.0656627





NBOOT

Model-based bootstrap Sum t p-value, using 20000 samples


with 95% confidence limits on the true bootstrap p-value


BOOTTSUMP LCLSUMT UCLSUMT

0.0001 -0.000096 0.000296

1.55 % of the bootstrap samples had 0 variance


NBOOT

Nonparametric bootstrap Kramer p-value, using 20000 samples


with 95% confidence limits on the true bootstrap p-value


BOOTTNP LCLNP UCLNP

0.1404 0.1333184 0.1452147
Robustness of Bootstrap to Serial
Correlation
Recall that the method is essentially a
comparison of Y
0
to the distribution of
Y
1
,,Y
n
.
If the empirical distribution of Y
1
,,Y
n

converges to F, then the unconditional null
probability of an event also converges to
o =F(c
o/2
) + (1-F(c
1o/2
)).

Such convergence occurs for typical
stationary time series processes.
Conclusions
We use t, not z even when n is large. Why?
Because t is generally more accurate.

We should use bootstrap tests instead of
traditional tests for precisely the same reason.

We must account for cross-sectional correlation
in the analysis.

The recommended method is our bootstrap with
a modification of Kramers Z (The model-based
sum t method)

Software is available from [email protected]

You might also like