Samenvatting Chapter 1-3 Econometrie Watson
Samenvatting Chapter 1-3 Econometrie Watson
Samenvatting Chapter 1-3 Econometrie Watson
The quantitative questions need quantitative answers. The conceptual framework used is the
multiple regression model. It provides a mathematical way to quantify how a change in one
variable affects another variable, holding other things constant.
Even though forecasting does not necessarily need to involve causal relationships,
economic theory suggests patterns and relationships that might be useful for forecasting.
Multiple regression analysis allows us to quantify historical relationships suggested by
economic theory, to check whether those relationships have been stable over time, to make
quantitative forecasts about the future, and to assess the accuracy of those forecasts.
2. Review of Probability
Random Variables and Probability Distributions
The set of all possible outcomes with accompanying probabilities is called the sample
space. An event is a subset of the sample space, that is, an event is a set of one or more
outcomes. The event “my computer will crash” is the set of two outcomes: “no crashes” and
“one crash”.
A random variable is a numerical summary of a random outcome. A discrete random
variable takes on only discrete values like 0,1,2…, whereas a continuous random variable
takes on a continuum of possible values.
The probability distribution of a discrete random variable is the list of all possible values of
the variable and the probability that each value will occur (these probabilities sum up to 1).
The probability is summarized by the probability density function (pdf).The cumulative
probability distribution is the probability that the random variable is less than or equal to a
particular value. It is also referred to as a cumulative distribution function, a cdf, or a
cumulative distribution.
A binary random variable is called a Bernoulli random variable, and its probability
distribution is called the Bernoulli distribution.
The variance of a random variable, denoted var(Y) is the expected value of the square of
the deviation of Y from its mean. The variance of the discrete random variable Y, denoted
σY2 is:
k
σY2 = v ar(Y ) = E [(Y − μY )2 ] = ∑ (y i − μY )2 pi
i=1
The standard deviation of Y is σ Y , the square root of the variance. The units of the standard
deviation are the same as the units of Y.
Variance of a bernoulli experiment:
v ar(G) = σG2 = p(1 − p). Thus, the standard deviation is σ G = √p(1 − P )
With Y = a + bX
μY = a + bμX and
σY2 = b2 σX2 and
σ Y = bσ X
E[(Y −μY )3 ]
Skewness = σ 3Y
For a symmetric distribution, a value of Y a given amount above its means is just as likely as
a value of Y the same amount below its mean. Thus, for a symmetric distribution,
E [(Y − μY )3 ] = 0 ; the skewness of a symmetric distribution is zero. Positive skewness
means long right tail, negative skewness means long left tail.
The kurtosis measures the thickness of the tails, and therefore how much of the variance of
Y arises from extreme values, called outliers. The greater the kurtosis of a distribution, the
more likely are outliers:
E[(Y −μY )4 ]
Kurtosis = σ 4Y . Kurtosis can’t be negative. The kurtosis of normally distributed random
variable is 3. A distribution with kurtosis>3 is called leptokurtic, heavy-tailed.
Both kurtosis and skewness are unit-free: changing the units of Y does not change kurtosis
and skewness.
The mean of Y, E(Y), is also called the first moment of Y, and the expected value of the
square of Y, E (Y 2 ) , is called the second moment of Y. In general, the expected value of Y r
is called the rth moment of the random variable Y. That is, the rth moment of Y is E (Y r ) .
The skewness is a function of the first, second, and third moments of Y. The kurtosis is a
function of the first through fourth moments of Y.
The joint probability distribution can be written as the function Pr(X=x, Y=y). The
probabilities of all possible (x,y) combinations sum to 1.
The marginal probability distribution of a random variable Y is just another name for its
probability distribution. This term is used to distinguish the distribution of Y alone (the
marginal distribution) from the joint distribution of Y and another random variable. If X can
take on l different values x1 , ... , xl , then the marginal probability that Y takes on value y is
l
P r(Y = y ) = ∑ P r(X = xi , Y = y ) .
i=1
Conditional Distributions
The mean of Y is the weighted average of the conditional expectation of Y given X, weighted
by the probability distribution of X. E.g. the mean height of adults is the weighted average of
the mean height of men and the mean height of women, weighted by the proportions of men
and women. Stated mathematically, if X takes on l values x1 , ... , xl , then
l
E (Y ) = ∑ E (Y |X = xi )P r(X = xi ) . (2.19)
i=1
Equation 2.19 follows forms equations 2.17 and 2.18.
Stated differently, the expectation of Y is the expectation of the conditional expectation of Y
given X:
E [Y } = E [E(Y |X)] , (2.20)
where the inner expectation of the right hand side of equation 2.20 is computed using the
conditional distribution of Y given X and the outer expectation is computed using the
marginal distribution of X. Equation 2.20 is known as the law of iterated expectations.
The law of iterated expectations implies that if the conditional mean of Y given X is zero,
then the mean of Y is zero. This is an immediate consequence of equation 2.20: if
E (Y |X) = 0, then E (Y ) = E [E(Y |X)] = E (0) = 0 . Said differently, if the mean of Y given X is
zero, then it must be that the probability weighted average of these conditional means is
zero, that is, the mean of Y must be zero. The law of iterated expectations also applies to
expectations that are conditional on multiple random variables. E.g. let X,Y and Z be random
variables that are jointly distributed. Then the law of iterated expectations says that
E (Y ) = E [E(Y |X, Z )] , where E (Y |X, Z ) is the conditional expectation of Y given both X and Z.
Conditional Variance
Independence
The covariance is between X and Y is the expected value E [(X − μX )(Y − μY )] , where μX is
the mean of X and μY is the mean of Y. The covariance is denoted cov(X,Y) or σ XY . If X can
take on l values and Y can take on k values, the the covariance is given by the formula
k l
cov(X, Y ) = σ XY = E [(X − μX )(Y − μY )] = ∑ ∑ (xj − μx )(y i − μY )P r(X = xj , Y = y i )
i=1 j=1
Because the covariance is the product of X and Y, deviated from their means, the units are,
awkwardly, the units of X multiplied by the units of Y. This “units” problem can make
numerical values of the covariance difficult to interpret. The correlation solves this unit
problem; the correlation between X and Y is the covariance between X and Y divided by
their standard deviations:
σ XY
corr(X, Y ) = σX σY
If the conditional mean of Y does not depend on X, then Y and X are uncorrelated:
if E (Y |X) = μY , then cov(X, Y ) = and corr(Y , X ) = 0
Let X,Y and V be random variables, let μx and σX2 be the mean and variance of X, let σ XY be
the covariance between X and Y (and so forth for the other variables). The following
equations follow from the definitions of the mean, variance and covariance
E (a + bX + cY ) = a + bμX + cμY
v ar(a + bY ) = b2 σY2
The variance of the sum of X an Y is the sum of their variances plus two times their
covariance:
v ar(aX + bY ) = a2 σX2 + 2abσ XY + b2 σY2
E (Y 2 ) = σY2 + μY2
cov(a + bX + cV , Y ) = bσ XY + cσ V Y
E (XY ) = σ XY + μX μY
√σ σ
|corr(X, Y )| ≤ 1 and |σ XY | ≤ 2 2 (correlation inequality)
X Y
The normal distribution can be generalized to describe the joint distribution of a set of
random variables. In this case, the distribution is called the multivariate normal
distribution, or if only two variables are being considered, the bivariate normal
distribution. The multivariate normal distribution has four important properties.
1. If X and Y have a bivariate normal distribution with covariance σ XY and if a and b are
two constants, the aX and bY has the normal distribution:
aX + bY is distributed N (aμx + bμY , a2 σX2 + b2 σY2 + 2abσ XY ) with (X,Y bivariate normal).
More generally, if n random variables have a multivariate normal distribution, then a linear
combination of these two variables (such as their sum) is normally distributed.
2. if a set of variables has a multivariate normal distribution, then the marginal
distribution of each of the variables is normal
3. If variables with a multivariate normal distribution have covariances that equal zero,
then the variables are independent.
4. If X and Y have a bivariate normal distribution, then the conditional expectation of Y
given X is linear in X; that is E(Y|X=x)=a+bx, where a and b are constants. Joint
normality implies linearity of conditional expectations, but linearity of conditional
expectations does not imply joint normality.
The F Distribution
…
maybe watch youtube videos on the different distributions: When to use which and how to
interpret the implications
n
1
E (Y ) = n ∑ E (Y i ) = μY
i=1
n
v ar(Y ) = v ar( n1 ∑ Y i )
i=1
n n n
1 1
= n2 ∑ v ar(Y i ) + n2 ∑ ∑ cov(Y i , Y j )
i=1 i=1 j=1,j=i
/
σ 2Y
= n
The standard deviation of Y is the square root of the variance, σ Y √n , so
σY
st.dev(Y ) = σ Y = √n
If Y 1 , ... , Y n are i.i.d. draws from the N (μY , σY2 ) , then Y is distributed N( μY , σY2 /n)
3. Review of Statistics
3.1 Estimation of the Population Mean
An estimator is a function of a sample of data to be drawn randomly from a population. An
estimate is the numerical value of the estimator when it is actually computed using data
from a specific sample. An estimator is a random variable because of randomness in
selecting the sample, while an estimate is a nonrandom number. There are three desirable
characteristics of an estimator: unbiasedness, consistency, and efficiency.
︿
The estimator is unbiased when E (μY ) = μY .
︿ ︿
μY is consistent for μY when the probability that the μY is within a small interval of the true
︿
value μY approaches 1 as the sample size increases: μY → μY
Efficient means the estimator with the least variance.
[ ]
act
p − v alue = P rH 0 |Y − μY ,0 | > |Y − μY ,0 |
That is the, the p-value is the area in the tails of the distribution of Y under the null
act
hypothesis beyond μY ,0 ± |Y − μY ,0 | .
The formula for the p-value depends on the variance of the population distribution, σY2 . In
practice, this variance is typically unknown (except with bernoulli experiment). We now turn
to the problem of estimating σY2 .
n
sY2 = 1
n−1 ∑ (Y i − Y )2
i=1
√
n
1 2
sY = n−1 ∑ (Y i − Y )
i=1
︿ s
S E(Y ) = σY = √Yn
sY2 → σY2 when n is large, Y 1 , ... , Y n are drawn i.i.d. and Y i has a finite fourth moment; that
is, E (Y 4i ) < ∞ .
When Y 1 , ... , Y n are i.i.d. draws from a Bernoulli distribution with success probability p, the
p(1−p)
formula for the variance of Y simplifies to n . The formula for the standard error also
|
(
| Y act −μ |
p − v alue = 2Φ − || Y ,0 |
SE(Y ) |
|
.
)
The standardized sample average (Y − μY ,0 )/SE(Y ) plays a central role in testing statistical
hypotheses and has a special name, the t-statistic or t-ratio:
Y −μY ,0
t=
SE(Y )
In general, a test statistic is a statistic used to perform a hypothesis test. The t-statistic is an
important example of a test statistic.
When n is large, sY2 → σY2 . Thus the distribution of the t-statistic is approximately the same
as the distribution of (Y − μY ,0 )/σ Y , which in turn is well approximated by the standard
normal distribution when n is large because of the central limit theorem. Accordingly, under
the null hypothesis,
t is approximately distributed N(0,1) for large n.
SE(Y ) |
|
)
. , can be rewritten in terms of the
t-statistic. let tact denote the value of the t-statistic actually computed:
act
Y −μY ,0
tact =
SE(Y )
Accordingly, when n is large, the p-value can be calculated using
p − v alue = 2Φ(− |tact |)
EXAMPLE:
n=200 and E(Y) = $20 per hour
act
Y = $22.64 and σ Y = $18.14 , then S E(Y ) = sY /√n = 18.14/√200 = 1.28
act
t = (22.64 − 20)/1.28 = 2.06
2Φ(− 2.06) =0.039=3.9%.
That is, assuming the null hypothesis to be true, the probability of obtaining a sample
average at least as different from the null as the one actually computed is 3.9%.
Summary hypothesis tests for the population mean against the two-sided alternative:
Testing the Hypothesis E (Y ) = μY ,0 against the hypothesis E (Y ) =/ μY ,0 :
︿ sY
1. compute the standard error of Y , S E(Y ) : S E(Y ) = σY = √n
act
Y −μY ,0
2. compute the t-statistic tact =
SE(Y )
3. Compute the p-value p − v alue = 2Φ(− |tact |) . Reject at 5% significance if the p value is
less than 0.05 (equivalently, if |tact | >1.96).
One-sided alternative
The one-sided alternative hypothesis:
H 1 : E (Y ) > μY ,0
Only large values of the t-statistic reject the null hypothesis rather than values that are large
in absolute value.
p − v alue = P rH 0 (Z > tact ) = 1Φ(tact )
The N(0,1) critical value for a one-sided test with a 5% significance level is 1.64.
The coverage probability of a confidence interval for the population mean is the probability,
computed over all possible random samples, that it contains the true population mean.
Estimator of μm − μw is Y m − Y w .
We need to know the distribution of Y m − Y w . Both Y m and Y w are, according to the central
limit theorem, N (μm , σm2 /nm ) and N (μw , σw2 /nw ) . Because Both Y m and Y w are constructed from
different randomly selected samples, they are independent random variables. Thus,
Y m − Y w is distributed N [ μm − μw , (σm2 /nm ) + (σw2 /nw )]
We have to use estimators, so
√
s2m s2w
S E( Y m − Y w ) = nm + nw
t-statistic for comparing two means:
(Y m −Y w )−d0
t=
SE(Y m −Y w )
|
. yields the formula for the
t-statistic:
y−μY ,0
t=
√s
2 /n
Y
n
where sY2 is given by sY2 = 1
n−1 ∑ (Y i − Y )2 .
i=1
If Y is normally distributed then the t-statistic in the above equation has a Student t
distribution with n-1 degrees of freedom. If the population distribution is normally distributed,
then critical values from the Student t distribution can be used to perform hypothesis tests
and to construct confidence intervals.
(Y m −Y w )−d0
The t-statistic testing the difference of two means, given in equation t = , does not
SE(Y m −Y w )
have a Student t distribution, even if the population distribution of Y is normal. (the Student t
distribution does not apply here because the variance estimator used to compute the
√
s2m s2w
standard error in equation S E( Y m − Y w ) = nm + nw does not produce a denominator in the
t-statistic with a chi-squared definition).
for large n, the differences between the student t distribution and th standard normal
distribution are negligible.
|rXY | ≤ 1
sXY → σ XY with larger n, under the assumption that (Xi,Yi) are i.i.d. and have finite fourth
moments.
Because the sample variance and sample covariance are consistent, the sample correlation
coefficient is consistent, that is, rXY → corr(C i , Y i )
The correlation coefficient is a measure of linear association, and is thus not applicable on
quadratic relationships.