Samenvatting Chapter 1-3 Econometrie Watson

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

lOMoARcPSD|12023103

Introduction to Econometrics Part 1 Introduction and Review


Chapter 1, 2, 3
Internship Minor Applied Econometrics: A Big Data Experience for All (Vrije Universiteit
Amsterdam)

Studeersnel wordt niet gesponsord of ondersteund door een hogeschool of universiteit


Gedownload door Tess Scholtus ([email protected])
lOMoARcPSD|12023103

1. Economic Questions and Data


Economic Questions we examine
At a broad level, econometrics is the science of using economic theory and statistical
techniques to analyze economic data.

4 Examplary questions analyzed in the book:


1. Does reducing class size improve elementary school education?
2. Is there racial discrimination in the market for home loans?
3. How much do cigarette taxes reduce smoking?
4. How much will U.S. GDP grow next year?

The quantitative questions need quantitative answers. The conceptual framework used is the
multiple regression model. It provides a mathematical way to quantify how a change in one
variable affects another variable, holding other things constant.

Causal Effects and Idealized Experiments


One way to measure causal effect, is with a ​randomized controlled experiment​. It is
controlled in the sense there are both a ​control group ​that receives no treatment, and a
treatment group​ that receives the treatment. The treatment is assigned randomly. In this
book, the ​causal effect ​is defined to be the effect on an outcome of a given action or
treatment, as measured in an ideal randomized experiment. In such an experiment, the only
systematic reason for differences in outcomes between the treatment and the control groups
is the treatment itself. The randomized controlled experiment is rare because it is often
unethical, impossible or too expensive, but provides a theoretical benchmark for an
econometric analysis of causal effects using actual data.

Even though forecasting does not necessarily need to involve causal relationships,
economic theory suggests patterns and relationships that might be useful for forecasting.
Multiple regression analysis allows us to quantify historical relationships suggested by
economic theory, to check whether those relationships have been stable over time, to make
quantitative forecasts about the future, and to assess the accuracy of those forecasts.

Data: Sources and Types


In econometrics, data come from two sources: experiments (leading to ​experimental data​)
or non experimental observations in the world (leading to ​observational data​). With
observational data it is hard to sort out the effect of the treatment from other relevant factors,
and econometrics seeks to provide tools to tackle challenges encountered when real-world
data are used to estimate causal effects.
Whether data are experimental or observational, data sets come in three main types:
- cross-sectional

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

- Data on different entities for a single time period


- time​ ​series
- Data for a single entity collected at multiple time periods
- panel​ (​longitudinal​)
- Data for multiple entities in which each entity is observed at two or more time
periods

2. Review of Probability
Random Variables and Probability Distributions
The set of all possible ​outcomes​ with accompanying ​probabilities​ is called the ​sample
space​. An ​event ​is a subset of the sample space, that is, an event is a set of one or more
outcomes. The event “my computer will crash” is the set of two outcomes: “no crashes” and
“one crash”.
A random variable is a numerical summary of a random outcome. A ​discrete random
variable​ takes on only discrete values like 0,1,2…, whereas a ​continuous random variable
takes on a continuum of possible values.
The ​probability distribution ​of a discrete random variable is the list of all possible values of
the variable and the probability that each value will occur (these probabilities sum up to 1).
The probability is summarized by the probability density function (​pdf​).The ​cumulative
probability​ ​distribution ​is the probability that the random variable is less than or equal to a
particular value. It is also referred to as a ​cumulative distribution function, ​a ​cdf​, or a
cumulative distribution​.
A binary random variable is called a ​Bernoulli random variable​, and its probability
distribution is called the ​Bernoulli distribution​.

Expected Values, Mean and Variance


The ​expected​ ​value​ of random variable Y, denoted E (Y ) , is the long-run average value of
the random variable over many repeated trials or occurrences. The expected value of a
discrete random variable is computed as a weighted average of the possible outcomes of
that random variable, where the weights are the probabilities of that outcome. The expected
value of Y is also called the ​expectation ​of Y and the ​mean ​of Y is denoted μY .
k
E (Y ) = ∑ y i pi
i=1
Expected value of a Bernoulli random variable:
E (G) = 1 × p + 0 × (1 − p) = p , thus the expected value of a Bernoulli random variable is p, the
probability that it takes on the value “1.”

The ​variance​ of a random variable, denoted var(Y) is the expected value of the square of
the deviation of Y from its mean. The variance of the discrete random variable Y, denoted
σY2 is:

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

k
σY2 = v ar(Y ) = E [(Y − μY )2 ] = ∑ (y i − μY )2 pi
i=1
The standard deviation of Y is σ Y , the square root of the variance. The units of the standard
deviation are the same as the units of Y.
Variance of a bernoulli experiment:
v ar(G) = σG2 = p(1 − p). Thus, the standard deviation is σ G = √p(1 − P )

With Y = a + bX
μY = a + bμX and
σY2 = b2 σX2 and
σ Y = bσ X

The ​skewness ​measures the lack of symmetry of a distribution.

E[(Y −μY )3 ]
Skewness = σ 3Y

For a symmetric distribution, a value of Y a given amount above its means is just as likely as
a value of Y the same amount below its mean. Thus, for a symmetric distribution,
E [(Y − μY )3 ] = 0 ; the skewness of a symmetric distribution is zero. Positive skewness
means long right tail, negative skewness means long left tail.

The ​kurtosis​ measures the thickness of the tails, and therefore how much of the variance of
Y arises from extreme values, called ​outliers​. The greater the kurtosis of a distribution, the
more likely are outliers:
E[(Y −μY )4 ]
Kurtosis = σ 4Y . Kurtosis can’t be negative. The kurtosis of normally distributed random
variable is 3. A distribution with kurtosis>3 is called ​leptokurtic​, heavy-tailed.
Both kurtosis and skewness are unit-free: changing the units of Y does not change kurtosis
and skewness.

The mean of Y, E(Y), is also called the first moment of Y, and the expected value of the
square of Y, E (Y 2 ) , is called the second moment of Y. In general, the expected value of Y r
is called the rth moment ​of the random variable Y. That is, the rth moment of Y is E (Y r ) .
The skewness is a function of the first, second, and third moments of Y. The kurtosis is a
function of the first through fourth moments of Y.

Two Random Variables


Answering questions concerning two random variables require an understanding of the
concepts of joint, marginal, and conditional probability distributions.

Joint and Marginal Distributions

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

The ​joint probability distribution​ can be written as the function Pr(X=x, Y=y). The
probabilities of all possible (x,y) combinations sum to 1.
The ​marginal probability distribution​ of a random variable Y is just another name for its
probability distribution. This term is used to distinguish the distribution of Y alone (the
marginal distribution) from the joint distribution of Y and another random variable. If X can
take on l different values x1 , ... , xl , then the marginal probability that Y takes on value y is
l
P r(Y = y ) = ∑ P r(X = xi , Y = y ) .
i=1

Conditional Distributions

The distribution of a random variable Y conditional on another random variable X taking on a


specific value is called the ​conditional value ​of Y given X: Pr(Y=y I X=x).
In general, the conditional distribution of Y given X = x is
P r(X=x, Y =y)
P r(Y = y | X = x) = P r(X=x) (2.17)
The ​conditional expectation​ of Y given X, also called the ​conditional mean​ of Y given X,
is the mean of the conditional distribution of Y given X. That is, the conditional expectation is
the expected value of Y, computed using the conditional distribution of Y given X. If Y takes
on ​k ​values y 1 , ... , y k ,then the conditional mean of Y given X=x is
k
E [Y |X = x] = ∑ y i P r(Y = y i | X = x) (2.18)
i=1
The conditional value of Y given X=x is just the mean value of Y when X=x.

Law of Iterated Expectations

The mean of Y is the weighted average of the conditional expectation of Y given X, weighted
by the probability distribution of X. E.g. the mean height of adults is the weighted average of
the mean height of men and the mean height of women, weighted by the proportions of men
and women. Stated mathematically, if X takes on l values x1 , ... , xl , then
l
E (Y ) = ∑ E (Y |X = xi )P r(X = xi ) . (2.19)
i=1
Equation 2.19 follows forms equations 2.17 and 2.18.
Stated differently, the expectation of Y is the expectation of the conditional expectation of Y
given X:
E [Y } = E [E(Y |X)] , (2.20)
where the inner expectation of the right hand side of equation 2.20 is computed using the
conditional distribution of Y given X and the outer expectation is computed using the
marginal distribution of X. Equation 2.20 is known as the ​law of iterated expectations​.
The law of iterated expectations implies that if the conditional mean of Y given X is zero,
then the mean of Y is zero. This is an immediate consequence of equation 2.20: if
E (Y |X) = 0, then E (Y ) = E [E(Y |X)] = E (0) = 0 . Said differently, if the mean of Y given X is
zero, then it must be that the probability weighted average of these conditional means is

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

zero, that is, the mean of Y must be zero. The law of iterated expectations also applies to
expectations that are conditional on multiple random variables. E.g. let X,Y and Z be random
variables that are jointly distributed. Then the law of iterated expectations says that
E (Y ) = E [E(Y |X, Z )] , where E (Y |X, Z ) is the conditional expectation of Y given both X and Z.

Conditional Variance

The variance of Y conditional on X is the variance of the conditional distribution of Y given X.


Stated mathematically, the ​conditional variance​ of Y given X is
k
v ar(Y |X = x) = ∑ [y i − E (Y |X = x)]2 P r(Y = y i | X = x) (2.21)
i=1

Independence

Two random variables X and Y are independently distributed, or ​independent​, if knowing


the value of one of the variables provides no information about the other. Specifically, X and
Y are independent if the conditional distribution of Y given X equals the marginal distribution
of Y. That is, X and Y are independently distributed, for all values of x and y,

P r(Y = y | X = x) = P r(Y = y ) (independence of X and Y) (2.22)


Substituting equation 2.22 into equation 2.17 gives an alternative expression for independent
random variables in terms of their joint distribution, if X and Y are independent, then:
P r(X = x, Y = y ) = P r(X = x)P r(Y = y )
That is, the joint distribution of the two independent variables is the production of their
marginal distributions.

Covariance and Correlation

The ​covariance​ is between X and Y is the expected value E [(X − μX )(Y − μY )] , where μX is
the mean of X and μY is the mean of Y. The covariance is denoted cov(X,Y) or σ XY . If X can
take on l values and Y can take on k values, the the covariance is given by the formula

k l
cov(X, Y ) = σ XY = E [(X − μX )(Y − μY )] = ∑ ∑ (xj − μx )(y i − μY )P r(X = xj , Y = y i )
i=1 j=1

Because the covariance is the product of X and Y, deviated from their means, the units are,
awkwardly, the units of X multiplied by the units of Y. This “units” problem can make
numerical values of the covariance difficult to interpret. The ​correlation​ solves this unit
problem; the correlation between X and Y is the covariance between X and Y divided by
their standard deviations:

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

σ XY
corr(X, Y ) = σX σY
If the conditional mean of Y does not depend on X, then Y and X are uncorrelated:
if E (Y |X) = μY , then cov(X, Y ) = and corr(Y , X ) = 0

The Mean and Variance of Sums of Random Variables


The mean of the sum of two random variables, X and Y, si simply the sum of their means:
E (X + Y ) = E (X) + E (Y ) = μX + μY

Let X,Y and V be random variables, let μx and σX2 be the mean and variance of X, let σ XY be
the covariance between X and Y (and so forth for the other variables). The following
equations follow from the definitions of the mean, variance and covariance
E (a + bX + cY ) = a + bμX + cμY
v ar(a + bY ) = b2 σY2
The variance of the sum of X an Y is the sum of their variances plus two times their
covariance:
v ar(aX + bY ) = a2 σX2 + 2abσ XY + b2 σY2
E (Y 2 ) = σY2 + μY2
cov(a + bX + cV , Y ) = bσ XY + cσ V Y
E (XY ) = σ XY + μX μY

√σ σ
|corr(X, Y )| ≤ 1 and |σ XY | ≤ 2 2 (correlation inequality)
X Y

The Normal, Chi-squared, Student T, and ​F


distributions
The Normal Distribution
N (μ, σ 2 ) Mean μ and variance σ 2 is symmetric around its mean and has 95% of probability
between μ − 1.96σ and μ + 1.96σ . The standard normal distribution has mean zero and
variance 1 and is denoted N(0,1), random variables that have a N(0,1) distribution are often
denoted Z, and the standard normal cumulative distribution function is denoted by Φ ;
accordingly, P r(Z ≤ c) = Φ(c) , where c is a constant.

Computing Probabilities Involving Normal Random Variables

Suppose Y is normally distributed with mean μ and variance σ 2 ; in other words Y is


distributed N( μ, σ 2 ) , Then Y is standardized by subtracting the mean and dividing by its
standard deviation, that is, by computing Z = (Y − μ)/σ
Let c1 and c2 denote two numbers with c1 < c2 and let d1 = (c1 − μ)/σ and d2 = (c2 − μ)/σ .
Then

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

P r(Y ≤ c2 ) = P r(Z ≤ d2 ) = Φ(d2 )


P r(Y ≥ c1 ) = P r(Z ≥ d1 ) = 1 − Φ(d1 )
P r(c1 ≤ Y ≤ c2 ) = P r(d1 ≤ Z ≤ d2 ) = Φ(d2 ) − Φ(d1 )

The normal distribution can be generalized to describe the joint distribution of a set of
random variables. In this case, the distribution is called the ​multivariate normal
distribution​, or if only two variables are being considered, the ​bivariate normal
distribution​. The multivariate normal distribution has four important properties.
1. If X and Y have a bivariate normal distribution with covariance σ XY and if a and b are
two constants, the aX and bY has the normal distribution:
aX + bY is distributed N (aμx + bμY , a2 σX2 + b2 σY2 + 2abσ XY ) with (X,Y bivariate normal).
More generally, if ​n​ random variables have a multivariate normal distribution, then a linear
combination of these two variables (such as their sum) is normally distributed.
2. if a set of variables has a multivariate normal distribution, then the marginal
distribution of each of the variables is normal
3. If variables with a multivariate normal distribution have covariances that equal zero,
then the variables are independent.
4. If X and Y have a bivariate normal distribution, then the conditional expectation of Y
given X is linear in X; that is E(Y|X=x)=a+bx, where a and b are constants. Joint
normality implies linearity of conditional expectations, but linearity of conditional
expectations does not imply joint normality.

The Chi-Squared Distribution


The chi-squared distribution is used when testing certain types of hypotheses in statistics
and econometrics. The ​chi-squared distribution​ is the distribution of ​m ​squared
independent standard normal random variables. This distribution depends on ​m​, which is
called the degrees of freedom of the chi-squared distribution. For example, let Z 1 , Z 2 and Z 3
be independent standard normal variables. Then Z 12 + Z 22 + Z 32 has a chi-squared distribution
with 3 degrees of freedom. A chi-squared distribution with ​m​ degrees of freedom is denoted
χm2 .

The Student t Distribution


When m is small (20 or less), it has more mass in the tails than the normal distribution.
When m is more than 30, the Student t distribution is well approximated by the normal
standard distribution and the t∞ equals the standard normal distribution.

The F Distribution

maybe watch youtube videos on the different distributions: When to use which and how to
interpret the implications

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

Random Sampling and the Distribution of the


Sample Average
Random Sampling
In ​simple random sampling​, ​n ​objects are selected at random from a ​population​ and each
member of the population is each likely to be selected in the sample. When Y i has the same
marginal distribution for i = 1, ..., n then Y 1 , ... , Y n are said to be ​identically distributed​.
When Y 1 , ... , Y n are drawn form the same distribution and are independently distributed,
they are said to be ​independently and identically distributed​ or (​i.i.d.​).

The Sampling Distribution of the Sample Average


The ​Sample average ​or ​sample mean​, Y , of the ​n​ observations Y 1 , ... , Y n is
n
1
Y = n ∑ Yi
i=1
The distribution of Y is called the ​sampling distribution​ of Y because it is the probability
distribution associated with possible values of Y that could be computed for different
possible samples Y 1 , ... , Y n .

n
1
E (Y ) = n ∑ E (Y i ) = μY
i=1
n
v ar(Y ) = v ar( n1 ∑ Y i )
i=1
n n n
1 1
= n2 ∑ v ar(Y i ) + n2 ∑ ∑ cov(Y i , Y j )
i=1 i=1 j=1,j=i
/
σ 2Y
= n
The standard deviation of Y is the square root of the variance, σ Y √n , so
σY
st.dev(Y ) = σ Y = √n

If Y 1 , ... , Y n are i.i.d. draws from the N (μY , σY2 ) , then Y is distributed N( μY , σY2 /n)

Large-Sample Approximations to Sampling


Distribution
The law of large number says that, when the sample size is large, Y will be close to μY with
very high probability (convergence in probability) . The central limit theorem says that, when
the sample size is large, the sampling distribution of the standardized sample average,

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

(Y − μY )/σ Y ) is approximately normal. Because the distribution of Y approaches the normal


as n grows, Y is said to have an ​asymptotic normal distribution​.
To standardize Y , subtract its mean and divide by its standard deviation, so it has a mean
of 0 and a variance of 1.

3. Review of Statistics
3.1 Estimation of the Population Mean
An ​estimator ​is a function of a sample of data to be drawn randomly from a population. An
estimate​ is the numerical value of the estimator when it is actually computed using data
from a specific sample. An estimator is a random variable because of randomness in
selecting the sample, while an estimate is a nonrandom number. There are three desirable
characteristics of an estimator: unbiasedness, consistency, and efficiency.
︿
The estimator is ​unbiased​ when E (μY ) = μY .
︿ ︿
μY is consistent for μY when the probability that the μY is within a small interval of the true
︿
value μY approaches 1 as the sample size increases: μY → μY
Efficient means the estimator with the least variance.

Efficiency of Y : Y is ​BLUE​ (Best Linear Unbiased Estimator). The sample average Y


provides the best fit to the data in the sense that the average squared differences between
the observations and Y are the smallest of all possible estimators. Consider the problem of
n
finding an estimator that minimizes ∑ (Y i − m)2 ,
i=1
which is a measure of the total squared gap or distance between the estimator m and the
sample points. Because m is an estimator of E(Y), you can think of it as a prediction of the
value of Y i , so the gap Y i − m can be thought of as the prediction mistake. The estimator
that minimizes the sum of squared gaps Y i − m is called the ​least squares estimator​.

3.2 Hypothesis Tests Concerning the Population


Mean
Null Hypothesis:
H 0 : E (Y ) = μY ,0
Two-sided alternative hypothesis:
H 1 : E (Y ) =/ μY ,0
The ​p-value​, also called the ​significance probability​, is the probability of drawing a statistic
at least as adverse to the null hypothesis as the one you actually computed in your sample,
assuming that the null-hypothesis is correct.

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

[ ]
act
p − v alue = P rH 0 |Y − μY ,0 | > |Y − μY ,0 |
That is the, the p-value is the area in the tails of the distribution of Y under the null
act
hypothesis beyond μY ,0 ± |Y − μY ,0 | .

The formula for the p-value depends on the variance of the population distribution, σY2 . In
practice, this variance is typically unknown (except with bernoulli experiment). We now turn
to the problem of estimating σY2 .

The sample variance, sample standard deviation, and standard error

sY2 is an estimator of σY2 ,


and, sy is an estimator of σ Y ,
and, the standard error of the sample average Y is an estimator of the standard deviation of
the sampling distribution of Y .

n
sY2 = 1
n−1 ∑ (Y i − Y )2
i=1


n
1 2
sY = n−1 ∑ (Y i − Y )
i=1
︿ s
S E(Y ) = σY = √Yn

sY2 → σY2 when ​n​ is large, Y 1 , ... , Y n are drawn i.i.d. and Y i has a finite fourth moment; that
is, E (Y 4i ) < ∞ .

When Y 1 , ... , Y n are i.i.d. draws from a Bernoulli distribution with success probability p, the
p(1−p)
formula for the variance of Y simplifies to n . The formula for the standard error also

takes on a simple form that only depends on Y and n: S E(Y ) = √Y (1 − Y )/n


Calculating the p-value when σ Y is unknown.

|
(
| Y act −μ |
p − v alue = 2Φ − || Y ,0 |

SE(Y ) |
|
.
)
The standardized sample average (Y − μY ,0 )/SE(Y ) plays a central role in testing statistical
hypotheses and has a special name, the ​t-statistic​ or ​t-ratio​:
Y −μY ,0
t=
SE(Y )
In general, a test statistic is a statistic used to perform a hypothesis test. The t-statistic is an
important example of a test statistic.
When n is large, sY2 → σY2 . Thus the distribution of the t-statistic is approximately the same
as the distribution of (Y − μY ,0 )/σ Y , which in turn is well approximated by the standard

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

normal distribution when ​n​ is large because of the central limit theorem. Accordingly, under
the null hypothesis,
t is approximately distributed N(0,1) for large n.

The formula of the p-value, p − v alue = 2Φ − |


|
(
| Y act −μ |
| Y ,0 |

SE(Y ) |
|
)
. , can be rewritten in terms of the

t-statistic. let tact denote the value of the t-statistic actually computed:
act
Y −μY ,0
tact =
SE(Y )
Accordingly, when n is large, the p-value can be calculated using
p − v alue = 2Φ(− |tact |)

EXAMPLE:
n=200 and E(Y) = $20 per hour
act
Y = $22.64 and σ Y = $18.14 , then S E(Y ) = sY /√n = 18.14/√200 = 1.28
act
t = (22.64 − 20)/1.28 = 2.06
2Φ(− 2.06) =0.039=3.9%.
That is, assuming the null hypothesis to be true, the probability of obtaining a sample
average at least as different from the null as the one actually computed is 3.9%.

Hypothesis testing with a prespecified significance level

Reject if p-value is less than 5% → reject H 0 if |tact | > 1.96


The probability of erroneously rejecting the null hypothesis is 5%.
A statistical hypothesis test can make two types of mistakes: a ​type 1 error​, in which the null
hypothesis is rejected when it is in fact true, and a ​type 2 error​, in which the null hypothesis
is not rejected when in fact it is false. The pre specified probability of a type 1 error is the
significance level​ of the test. The ​critical value​ of the test statistic is the value of the
statistic is the value of the statistic for which the test just rejects the null hypothesis at the
given significance level. The set of values of the test statistic for which the test rejects the
null hypothesis is the ​rejection region​, and the values for which it does not reject the null
hypothesis is the ​acceptance region​. The probability that the test actually incorrectly rejects
the null hypothesis when it is true is the ​size of the test​, and the probability that the test
correctly rejects the null hypothesis when the alternative is true is the ​power of the test​.
The p-value is the probability of obtaining a test statistic, by random sampling variation, at
least as adverse to the null hypothesis as is the statistic actually observed, assuming that
the null hypothesis is correct. Equivalently, the p-value is the smallest significance level at
which you can reject the null hypothesis.

Summary hypothesis tests for the population mean against the two-sided alternative:
Testing the Hypothesis E (Y ) = μY ,0 against the hypothesis E (Y ) =/ μY ,0 :
︿ sY
1. compute the standard error of Y , S E(Y ) : S E(Y ) = σY = √n
act
Y −μY ,0
2. compute the t-statistic tact =
SE(Y )

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

3. Compute the p-value p − v alue = 2Φ(− |tact |) . Reject at 5% significance if the p value is
less than 0.05 (equivalently, if |tact | >1.96).

One-sided alternative
The ​one-sided alternative hypothesis​:
H 1 : E (Y ) > μY ,0
Only large values of the t-statistic reject the null hypothesis rather than values that are large
in absolute value.
p − v alue = P rH 0 (Z > tact ) = 1Φ(tact )
The N(0,1) critical value for a one-sided test with a 5% significance level is 1.64.

3.3 Confidence Intervals for the Population Mean


When the sample size n is large:
95% confidence interval for μY = {Y ± 1.96SE(Y )}
90% confidence interval for μY = {Y ± 1.64SE(Y )}
99% confidence interval for μY = {Y ± 2.58SE(Y )}

The ​coverage probability​ of a confidence interval for the population mean is the probability,
computed over all possible random samples, that it contains the true population mean.

3.4 Comparing Means from Different Populations


Test for the difference between two means​:
H 0 : μm − μw = d0
vs.
H 1 : μm − μw =/ do

Estimator of μm − μw is Y m − Y w .
We need to know the distribution of Y m − Y w . Both Y m and Y w are, according to the central
limit theorem, N (μm , σm2 /nm ) and N (μw , σw2 /nw ) . Because Both Y m and Y w are constructed from
different randomly selected samples, they are independent random variables. Thus,
Y m − Y w is distributed N [ μm − μw , (σm2 /nm ) + (σw2 /nw )]
We have to use estimators, so


s2m s2w
S E( Y m − Y w ) = nm + nw
t-statistic for comparing two means:
(Y m −Y w )−d0
t=
SE(Y m −Y w )

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

95% confidence interval for d = μm − μw is


(Y m − Y w ) ± 1.96SE(Y m − Y w )

3.5 Differences-of-means Estimation of Causal


effects using experimental data
The causal effect as a Difference of Conditional Expectations
The causal effect of a treatment is the expected effect on the outcome of interest of the
treatment measured in an ideal randomized controlled experiment.​ ​ This effect can be
expressed as the difference of two conditional expectations. Specifically, the ​causal effect
on Y of treatment level ​x​ is the difference in the conditional expectations,
E(Y|X=x) - E(Y|X=0). The causal effect is, in the context of experiments, also called the
treatment effect​. If treatment is binary, then we can let X=0 denote the control group and
X=1 the treatment group.

Estimation of the causal effect using differences in means


EXAMPLE:
average hourly 2004 men and 1951 women: 25.30 vs 21.50, sd 12.09 vs. 9.99
Thus, estimate gender gap is 25.30-21.50=3.80 with a standard error of
√12.092 /2004 + 9.992 /1951 =0.35.
95% confidence interval for gender gap = 3.80 ± 1.96 × 0.35 = (3.11, 4.49)

3.6 Using the t-statistic when the sample size is


small
When the sample size is small, the standard normal distribution can provide a poor
approximation to the distribution of the t-statistic. If however, the population distribution itself
is normally distributed, then the exact distribution of the t-statitistic testing the mean of a
single population is the Student t distribution with n-1 degrees of freedom, and critical values
can be taken from the Student t distribution.

The t-Statistic and the Student t distribution


︿
Substituting S E(Y ) = σY =
sY
√n
into p − v alue = 2Φ
( |
| Y act −μ |
− ||
SE(Y ) | )
Y ,0 |

|
. yields the formula for the

t-statistic:

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

y−μY ,0
t=
√s
2 /n
Y
n
where sY2 is given by sY2 = 1
n−1 ∑ (Y i − Y )2 .
i=1
If Y is normally distributed then the t-statistic in the above equation has a Student t
distribution with n-1 degrees of freedom. If the population distribution is normally distributed,
then critical values from the Student t distribution can be used to perform hypothesis tests
and to construct confidence intervals.

(Y m −Y w )−d0
The t-statistic testing the difference of two means, given in equation t = , does not
SE(Y m −Y w )
have a Student t distribution, even if the population distribution of Y is normal. (the Student t
distribution does not apply here because the variance estimator used to compute the


s2m s2w
standard error in equation S E( Y m − Y w ) = nm + nw does not produce a denominator in the
t-statistic with a chi-squared definition).

A modified version of the differences-of-means t-statistic, based on a different standard error


formula, the “pooled” standard error formula - has an exact Student t distribution when Y is
normally distributed; however, the poo ……. Look in book if relevant on p135.

for large n, the differences between the student t distribution and th standard normal
distribution are negligible.

3.7 Scatterplots, the Sample Covariance, and the


Sample correlation, three ways to summarize
relationships between variables
The sample covariance and correlation are estimators of the population covariance and
correlation.
n
1
sXY = n−1 ∑ (X i − X)(Y i − Y ) .
i=1
Like the sample variance, the average in the above equation is competed by using n-1
instead of n; here, too, this difference stems from using X and Y to estimate the respective
population means. When n is large, is doesnt matter much whether you use n-1 or n.
The sample correlation coefficient, or sample correlation:
s
rXY = S XYS
X Y

|rXY | ≤ 1

Like the sample variance, the sample covariance is consistent:

Gedownload door Tess Scholtus ([email protected])


lOMoARcPSD|12023103

sXY → σ XY with larger n, under the assumption that (Xi,Yi) are i.i.d. and have finite fourth
moments.

Because the sample variance and sample covariance are consistent, the sample correlation
coefficient is consistent, that is, rXY → corr(C i , Y i )
The correlation coefficient is a measure of linear association, and is thus not applicable on
quadratic relationships.

Gedownload door Tess Scholtus ([email protected])

You might also like