04 Inference

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Chapter 4: Multiple Regression Analysis – Inference

Econometrics

Michal Houda

University of South Bohemia in České Budějovice


Department of Applied Mathematics and Informatics

Michal Houda Chapter 4: Multiple Regression Analysis – Inference


Sampling Distributions of the OLS Estimators
Classical Linear Model (CLM) Assumptions

Assumption MLR 6 (Normality)


ε ∼ N (0; σ 2 ) and it is independent of x1 , . . . , xk .

Justifying normality

result of central limit theorem (not a tie with large sample sizes)
under MLR.1–6: OLS estimators are the best (minimum variance) unbiased
estimators (not only best linear)
population assumptions summarized as

y |x ∼ N (β0 + β1 x1 + . . . + βk xk ; σ 2 )

Theorem 1 (Normal Sampling Distribution)


Under MLR.1–6, β̂j ∼ N (βj , var β̂j ), that is,

β̂j − βj
∼ N (0, 1).
sd βj
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Testing Hypotheses about a single population parameter
One-Sided and Two-Sided t-Test

Corollary 2 (t Distribution for the standardized estimators)


Under MLR.1–6,
β̂j − βj
∼ tn−k−1 .
se β̂j

β̂j
Statistical packages usually provide t-ratio tβ̂j := se β̂j
automatically ⇒ tests

H 0 : βj = 0 against HA : βj 6= 0

(two-sided tests) are straightforward.


One-sided alternatives should be considered in econometrics.

Michal Houda Chapter 4: Multiple Regression Analysis – Inference


Testing Hypotheses about a single population parameter
Right-Tailed t-Test

One-sided alternative — right-tailed test:

H 0 : βj = 0 against H A : βj > 0

significance level α: probability of rejecting H0 when it is true (the most


popular choice: α := 0.05 = 5 %)
critical value: (1 − α)-percentile (quantile) of the appropriate distribution
(tn−k−1 here)
rejection rule: tβ̂j > tn−k−1 (1 − α)

As the degrees of freedom (n − k − 1) get larger, the t distribution approaches the


standard normal distribution N (0; 1)
Compare: t120 (0.95) = 1.658 with u(0.95) = 1.645

curve(dnorm(x), xlim=c(-5,5), col="red", lwd=4)


curve(dt(x, df=5), col="blue", lwd=2, add=TRUE)
R
curve(dt(x, df=120), col="green", lwd=1, add=TRUE)
colors()
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Testing Hypotheses about a single population parameter
Right-Tailed t-Test

Example 3 (Hourly Wage Equation)


Data: WAGE1
\ = 0.284 + 0.092educ + 0.0041exper + 0.022tenure
ln(wage)

H0 : βexper = 0 against HA : βexper > 0


texper ≈ 0.0041/0.0017 ≈ 2.39 > t522 (0.95) = 1.648 (or u(0.95) = 1.645)
1
p-value = Pr{texper > 2.39} = 0.0171 = 0.0085
2

H0 rejected at α = 5 % (even at 1%) . . . the effect of experience on wages


is statistically significant
But: the estimated return of experience is not large — for example, additional
3 years of experience provide only 3 × 0.0041 = 1.23% increase of wages

Michal Houda Chapter 4: Multiple Regression Analysis – Inference


Testing Hypotheses about a single population parameter
Left-Tailed t-Test

One-sided alternative — left-tailed test:

H 0 : βj = 0 against H A : βj < 0

critical value: α-percentile (quantile) of the appropriate distribution


rejection rule: tβ̂j < tn−k−1 (α)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference


Testing Hypotheses about a single population parameter
Left-Tailed t-Test

Example 4 (Student Performance and School Size)


Data: MEAP93
\ = 2.274 + 0.00046totcomp + 0.048staff − 0.00020enroll
math10

math10 . . . percentage of students passing the Michigan Educational Assessment Program


(MEAP) standardized 10-grade math test
totcomp . . . average annual teacher compensation (measure of teacher quality)
staff . . . number of staff per 1000 students (measure of attention received)
enroll . . . student enrollment (measure of school size)

H0 : βenroll = 0 against HA : βenroll < 0


tenroll ≈ −0.918 > t404 (0.05) = −1.649
1
p-value = Pr{texper > 2.39} = 0.36 = 0.18
2
H0 not rejected at 5 % (even at 15 %); changing the model:
\ = 2.274 + 0.00046 ln(totcomp) + 0.048 ln(staff ) − 0.00020 ln(enroll)
math10
tln(enroll) ≈ −1.829 < t404 (0.05) = −1.649 ⇒ H00 rejected at 5 %!
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Testing Hypotheses about a single population parameter
Two-sided t-Test

H 0 : βj = 0 against HA : βj 6= 0

critical value: 1 − α2 -percentile (quantile) of the appropriate distribution


rejection rule: tβ̂j < tn−k−1 (1 − α2 )

Example 5 (Determinants of College GPA)


Data: GPA1
\ = 1.39 + 0.412hsGPA + 0.015ACT − 0.083skipped
colGPA
skipped . . . average number of lectures missed per week

1 H0 : βhsGPA = 0 against HA : βhsGPA 6= 0: thsGPA = 4.396,


p-value ≈ 10−5 ⇒ H0 rejected at any conventional level;
2 H0 : βACT = 0 against HA : βACT 6= 0: tACT = 1.393, t137 (0.95) = 1.656,
p-value = 0.166 ⇒ H0 not rejected at 10 % — also small in practice;
3 H0 : βskipped = 0 against HA : βskipped 6= 0: tskipped = −3.197, t137 (0.995) = 2.612,
p-value = 0.0017 ⇒ H0 rejected at 1 % — but practically of small effect!
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Testing Hypotheses about a single population parameter
Tests against Other Alternatives

H0 : βj = aj against HA : βj T aj

Example 6 (Campus Crime and Student Enrollement)


Data: CAMPUS (FBI’s Uniform Crime Report for 1992, n = 97)

ln(crime) = β0 + β1 ln(enroll) + ε
\ = −6.63 + 1.27 ln(enroll)
ln(crime)
H0 : βexper = 1 against HA : βexper > 1

(crime is of more problem on larger campuses)

tln(enroll) ≈ (1.27 − 1)/0.11 ≈ 2.46 > t95 (0.95) = 1.66


p-value = Pr{texper > 2.46} ≈ 0.0079

H0 rejected (at 1%)


warning: this analysis holds no other factor fixed ⇒ elasticity 1.27 is not necessarily
a good estimate of ceteris paribus effect.
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Testing Hypotheses about a single population parameter
Economic (Practical) vs. Statistical Significance

Example 7 (Participation Rates in 401(k) Plans)

[ = 80.29 + 5.44mrate + 0.269age − 0.00013totemp


prate

totemp . . . total number of employees (firm size)

H0 : βtotemp = 0 against HA : βtotemp 6= 0


tln(totemp) ≈ −3.25, p-value ≈ 0.001

H0 rejected (even at 0,1 %) ⇒ βtotemp statistically significant


but: holding mrate, age fixed, +10,000 employees ⇒ only 1.3 percentage
point decrease in participation rate — not practically very large (not
economically significant)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference


Confidence Intervals


95% confidence interval for βj : β̂j ± tn−k−1 (0.975) · se β̂j
interpretation: unknown βj is in (known) (β j ; β j ) with 95% probability (for
95 % samples)
we only hope that we have used one of these 95 % samples
connection with H0 : βj = aj against HA : βj 6= aj — H0 rejected at (say) 5%
level ⇔ aj 6∈ (β j ; β j )

R confint(model)

Michal Houda Chapter 4: Multiple Regression Analysis – Inference


Testing a Single Linear Combination of Parameters
Example 8 (Return to Education)

ln(wage) = β0 + β1 jc + β2 univ + β3 exper + ε


jc . . . # years attending a two-year college
univ . . . # years attending a four-year college
exper . . . # months in the workforce

H 0 : β1 = β 2 against HA : β1 < β2

cannot simply use individual t statistics as se(β̂1 − β̂2 ) 6= se β̂1 − se β̂2


q
standard error estimated by se(β̂1 − β̂2 ) = \
(se β̂1 )2 + (se β̂2 )2 − 2cov(β̂1 , β̂2 )
(sometimes reported by the software).
easier technique: define θ1 := β1 − β2 , totcoll := jc + univ , and rewrite the model
ln(wage) = β0 + θ1 jc + β2 totcoll + β3 exper + ε
H0 : θ1 = 0 against HA : θ1 < 0
tθ̂1 ≈ −1.48, p-value ≈ 0.070 (data: TWOYEAR)
⇒ H0 is not rejected at 5% (it is rejected at 10%) — there is some, but not strong,
evidence of the campus size on criminal activities.
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Multiple Regression Analysis – Statistical Inference
Testing Multiple Linear Restrictions: F test

Example 9 (Baseball Players’ Salaries)


ln(salary ) = β0 + β1 years + β2 gamesyr + β3 bavg + β4 hrunsyr + β5 rbisyr + ε
salary . . . 1993 total salary
years . . . # years in the league
gamesyr . . . average # games played per year
bavg . . . career batting average
hrunsyr . . . # home runs per year
rbisyr . . . runs batted in per year

Estimate Std. Error t value Pr(>|t|)


(Intercept) 1.119e+01 2.888e-01 38.752 < 2e-16 ***
years 6.886e-02 1.211e-02 5.684 2.79e-08 ***
gamesyr 1.255e-02 2.647e-03 4.742 3.09e-06 ***
bavg 9.786e-04 1.104e-03 0.887 0.376
hrunsyr 1.443e-02 1.606e-02 0.899 0.369
rbisyr 1.077e-02 7.175e-03 1.500 0.134

H0 : β3 = β4 = β5 = 0 against HA : nonH0

. . . multiple (joint, three) exclusion restrictions


Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Multiple Regression Analysis – Statistical Inference
Testing Multiple Linear Restrictions: F test

Example 9 (Baseball Players’ Salaries)


ln(salary ) = β0 + β1 years + β2 gamesyr + β3 bavg + β4 hrunsyr + β5 rbisyr + ε

H0 : β3 = β4 = β5 = 0 against HA : nonH0

. . . multiple (joint, three) exclusion restrictions


Data: MLB1 (n = 353)
Restricted model: ln(salary ) = β0 + β1 years + β2 gamesyr + ε
Test statistics: F -ratio
(SSR r − SSR ur )/q
F := ∼ Fq,n−k−1
SSR ur /(n − k − 1)
In the example, F ≈ 9.55, p-value ≈ 4.10−6 ⇒ H0 rejected.
Note again that all the three t-statistics are insignificant!
(Reason: corr(hrusyn, rbisyr ) ≈ 0.89).

Overall significance test: H0 : x1 = . . . = xk = 0

H0 often rejected, even if R 2 is small


occasionally, the overall F is the focus of a study (e. g., to test whether some variable is
predictable based on selected factors — cf. efficient markets hypothesis)
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Multiple Regression Analysis – Statistical Inference
Testing General Linear Restrictions: F test

Example 10

ln(price) = β0 + β1 ln(assess) + β2 ln(lotsize) + β3 ln(sqrtft) + β4 ln(bdrms) + ε

price . . . house price


assess . . . the assessed housing value (before sold)
lotsize . . . size of the lot (in feet)
sqrft . . . square footage
bdrms . . . number of bedrooms
Data: HPRICE1, n = 88

Estimate Std. Error t value Pr(>|t|)


(Intercept) 0.263743 0.569665 0.463 0.645
log(assess) 1.043065 0.151446 6.887 1.01e-09 ***
log(lotsize) 0.007438 0.038561 0.193 0.848
log(sqrft) -0.103238 0.138430 -0.746 0.458
bdrms 0.033839 0.022098 1.531 0.129

Are the assessed housing prices of a rational valuation?

H0 : β1 = 1, β2 = β3 = β4 = 0
Michal Houda Chapter 4: Multiple Regression Analysis – Inference
Multiple Regression Analysis – Statistical Inference
Testing General Linear Restrictions: F test

Example 10

ln(price) = β0 + β1 ln(assess) + β2 ln(lotsize) + β3 ln(sqrtft) + β4 ln(bdrms) + ε

Are the assessed housing prices of a rational valuation?

H0 : β1 = 1, β2 = β3 = β4 = 0

F ≈ 0.661, p-value≈ 0.62 ⇒ failed to reject H0 .


There is no evidence against rational valuation.

Michal Houda Chapter 4: Multiple Regression Analysis – Inference

You might also like