10 Stockwatson 1
10 Stockwatson 1
10 Stockwatson 1
to Econometrics
Chapters 1, 2 and 3
The statistical analysis of
economic (and related)
data
1-2
What is econometrics?
1-3
1-4
Types of data
We make a distinction among:
Cross sectional data
Time series data
Panel or longitudinal data
and between:
Continuous variables (e.g. income, consumption)
Discrete variables
measurable on an interval scale (e.g. number of patents)
measurable on an ordinal scale (e.g. income class)
measurable on a nominal scale (e.g. unemployment status)
1-5
1-6
1-7
1-8
1-9
1-10
1.
2.
Test the null hypothesis that the mean test scores in the
two types of districts are the same, against the
alternative hypothesis that they differ (hypothesis
testing)
3.
1-11
Initial data analysis: Compare districts with small (STR < 20)
and large (STR 20) class sizes:
Class Size
Average score
( Y )
Standard deviation
(sY)
Small
657.4
19.4
238
Large
650.0
17.9
182
1-12
1. Estimation
Ysmall Ylarge =
1
nsmall
nsmall
Yi
i1
1
nlarge
nlarge
i1
= 657.4 650.0
= 7.4
1-13
2. Hypothesis testing
Ys Yl
ss2
ns
sl2
nl
Ys Yl
SE(Ys Yl )
(remember this?)
ns
1
2
(etc.)
(Y
Y
)
i
s
ns 1 i1
1-14
sY
19.4
17.9
Y
657.4
650.0
Ys Yl
ss2
ns
sl2
nl
657.4 650.0
19.42
238
17.92
182
n
238
182
7.4
= 4.05
1.83
1-15
3. Confidence interval
A 95% confidence interval for the difference
between the means is,
(
Ys Yl
) 1.96SE( Y Y )
s
1-16
1-17
1-18
1-19
Population distribution of Y
The probabilities of different values of Y that occur
in the population, for ex. Pr[Y = 650] (when Y is
discrete)
or: The probabilities of sets of these values, for
ex. Pr[640 Y 660] (when Y is continuous).
1-20
standard deviation =
variance = Y
1-21
Moments, ctd.
skewness =
3
E Y Y
Y3
= measure of asymmetry of a distribution
skewness = 0: distribution is symmetric
skewness > (<) 0: distribution has long right (left) tail
kurtosis =
E Y Y
Y4
= measure of mass in tails
= measure of probability of large values
kurtosis = 3: normal distribution
skewness > 3: heavy tails (leptokurtotic)
1-22
1-23
1-24
So is the correlation
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-25
corr(X,Z) =
XZ
= rXZ
var( X ) var(Z ) X Z
cov( X , Z )
1 corr(X,Z) 1
corr(X,Z) = 1 mean perfect positive linear association
corr(X,Z) = 1 means perfect negative linear association
corr(X,Z) = 0 means no linear association
1-26
1-27
1-28
1-29
1-30
1-31
Estimation
median(Y1,, Yn)
1-32
1-33
Y , ctd.
Then
E(Y) = p1 + (1 p) 0 = p = .78
[remember this?]
Y depends on n.
Y is,
1-34
1-35
Y is an unbiased estimator
is a consistent estimator of
1-36
Variance:
var( Y ) = E[ Y E(
= E[
)]2
Y]2
=E
Yi Y
n i1
2
= E1 n
(Yi Y )
n i1
1-37
so
var( Y ) = E (Yi Y )
n i1
1 n
1 n
= E (Yi Y ) (Y j Y )
n i1
n j1
1 n n
= 2 E (Yi Y )(Y j Y )
n
n
i1 j1
1 n n
= 2 cov(Yi ,Y j )
n i1 j1
1 n 2
= 2 Y
n i1
=
Y2
n
1-38
Implications:
Y2
n
Y) = Y)
1-39
Y when n is
1-40
Y| < ] 1 as n
p
1-41
n (
) (normal
Y2 /n)
(standard normal)
Y=
Y E(Y )
var(Y )
approximately distributed as N(0,1)
Y Y
Y / n
is
1-42
1-43
Y E(Y )
var(Y )
1-44
Y
Y E(Y )
var(Y )
1-45
To Estimate Y?
Y is unbiased: E( Y ) = Y
p
Y is consistent: Y Y
Y is the least squares estimator of Y; Ysolves,
n
min m (Yi m)2
i1
dm i1
d
dm (Yi m)2
i1
= 2 (Yi m)
i1
i1
i 1
Y = m = nm or m =
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1 n
Yi
n i1
= Y
1-46
Why Use
To Estimate Y, ctd.
1 n
estimators: consider the estimator, Y aiYi , where
n i 1
{ai} are such that Y is unbiased; then var( Y ) var( Y )
(proof: SW, Ch. 17)
1-47
Hypothesis Testing
1-48
Y:
PrH [| Y Y ,0 || Y act Y ,0 |]
0
Where
Y act
is the value of
1-49
= PrH0 [|
= PrH [|
0
Y Y ,0
Y / n
Y Y ,0
||
||
Y act Y ,0
Y / n
Y act Y ,0
|]
|]
= Y/
n.
1-50
1-51
1 n
(Yi Y )2 = sample variance of Y
=
n 1 i1
Fact:
If (Y1,,Yn) are i.i.d. and E(Y4) < , then
p
2
s Y
2
Y
1-52
= PrH [|
0
~ Pr [|
=
H
0
so
Y Y ,0
Y / n
Y Y ,0
||
||
sY / n
Y act Y ,0
Y / n
Y act Y ,0
sY / n
|]
|]
(large n)
1-53
1-54
1-55
1-56
degrees of freedom
(n 1)
5% t-distribution
critical value
10
2.23
20
2.09
30
2.04
60
2.00
1.96
1-57
1-58
1-59
Ys Yl
ss2
ns
sl2
nl
Ys Yl
SE(Ys Yl )
1-60
1.
2.
3.
4.
1-61
Confidence Intervals
A 95% confidence interval for Y is an interval
that contains the true value of Y in 95% of
repeated samples.
Digression: What is random here? The values of
Y1,...,Yn and thus any functions of them
including the confidence interval. The confidence
interval will differ from one sample to the next.
The population parameter, Y, is not random; we
just dont know it.
1-62
Y Y
sY / n
Y Y
= {Y: 1.96
= {Y (
sY
n
Y Y 1.96
Y 1.96
sY
n , Y + 1.96
sY / n
sY
1.96}
ns
Y
n
)}
s Y2 .
2
Y
1-63
Summary:
From the two assumptions of:
1. simple random sampling of a population, that is,
{Yi, i =1,,n} are i.i.d.
1-64
1-65