Statistical Inference

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Statistical Inference

HANH LE
8. Hypothesis Testing in Multiple Regression

The hypothesis testing assumes several interesting forms, such as the


following:
8.1. Testing hypotheses about an individual partial regression
coefficient
8.2. Testing the overall significance of the estimated multiple regression
model, that is, finding out if all the partial slope coefficients are
simultaneously equal to zero.
8.3. Testing that two or more coefficients are equal to one another
8.4. Testing that the partial regression coefficients satisfy certain
restrictions
8.5. Testing the stability of the estimated regression model over time or
in different cross-sectional units
8.6. Testing the functional form of regression models
2
8.1. Hypothesis testing about individual regression
coefficients: the null hypothesis in most applications
• A hypothesis about any individual partial regression
coefficient.
H0: j = 0
H1: j  0
•  Xj has no effect on the expected value of Y.
• If the computed t value > critical t value at the chosen level
of significance, we may reject the null hypothesis; otherwise,
we may not reject it
• Where: ˆ j  0
t
se( ˆ j )

3
Example : Determinants of college GPA

• The null hypothesis states that:


• ACT held constant,
• hsGPA has no influence on colGPA
H0: β2 = 0 and H1: β2 ≠ 0
0.4534  0
• t-test t  4.73
0.0958
• The critical t value is 2.61 for a two-tail test with the significance level
of 1% (look up tα/2 for 138 df)
• With the significance level of 1%, reject the null hypothesis that
hsGPA has no effect on colGPA.

4
8.1. Hypothesis testing about individual regression
coefficients: the null hypothesis in most applications
Under the CRM assumptions, we can easily construct a confidence interval (CI) for the
population parameter i. Confidence intervals are also called interval estimates because
they provide a range of likely values for the population parameter, and not just a point
estimate.
ˆi  i  t(n-k)
t
Se( ˆ ) i

ˆi  t / 2 Se( ˆi )  i  ˆi  t / 2 Se( ˆi ), i  1,3


Then CI:
If the level of confidence is 95%, then t/2 is the 97.5th percentile in a tn-k distribution.
1 - α is known as the confidence coefficient; and α (0 < α <1) is known as the level of
significance

The lower bound The upper bound


Example 2: Determinants of college GPA
•-
. use "D:\Bai giang\Kinh te luong\datasets\GPA1.DTA", clear

. reg colGPA hsGPA ACT

Source SS df MS Number of obs = 141


F( 2, 138) = 14.78
Model 3.42365506 2 1.71182753 Prob > F = 0.0000
Residual 15.9824444 138 .115814814 R-squared = 0.1764
Adj R-squared = 0.1645
Total 19.4060994 140 .138614996 Root MSE = .34032

colGPA Coef. Std. Err. t P>|t| [95% Conf. Interval]

hsGPA .4534559 .0958129 4.73 0.000 .2640047 .6429071


ACT .009426 .0107772 0.87 0.383 -.0118838 .0307358
_cons 1.286328 .3408221 3.77 0.000 .612419 1.960237 6
A reminder on the language of classical
hypothesis testing
• When Ho is not rejected  “We fail to reject H0 at the x% level”, do
not say: “H0 is accepted at the x% level”.
• Statistical significance vs economic significance: The statistical
significance is determined by the size of the t-statistics whereas the
economic significance is related to the size and sign of the estimators.

7
Testing Hypotheses on the coefficients

Hypotheses H0 Alternative Rejection


hypothesis H1 region

 j  0
Two tail |t0|>t(n-k),α/2
 j  0
Right tail t0 > t(n-k),α
 j  0  j  0

Left tail t0 <- t(n-k),α


 j  0  j  0

8
8.2. Testing the Overall Significance of
the Sample Regression
For Yi = 1 + 2X2i + 3X3i + ........+ kXki + ui
To test the hypothesis
H0: 2 =3 =....= k= 0 (all slope coefficients are simultaneously zero)
(this is also a test of significance of R2)
H1: Not at all slope coefficients are simultaneously zero
R (n  k )
2
F
(1  R 2 )(k  1)
(8.5.7)

(k = total number of parameters to be estimated including intercept)


If F > F critical = F,(k-1,n-k), reject H0, Otherwise you do not
reject it 9
Example : Testing the Overall Significance of
the Sample Regression

• Determinants of college GPA


0.1764 *138
F  14.78
(1  0.1764) * 2
• We have F > F critical = F0.05,(2,138) =3.062  reject H0

10
8.3. Testing the Equality of Two Regression Coefficients
• Suppose in the multiple regression
Yi = β1 + β2X2i + β3X3i + β4X4i + ui
we want to test the hypotheses
H0: β3 = β4 or (β3 − β4) = 0
H1: β3 ≠ β4 or (β3 − β4) ≠ 0
that is, the two slope coefficients β3 and β4 are equal.

• If the t variable exceeds the critical t value at the designated level


of significance for given df, then you can reject the null
hypothesis; otherwise, you do not reject it
11
7.3. Testing the Equality of Two Regression Coefficients

Option 1: t-test
• If the t variable exceeds the critical t value at the designated level
of significance for given df, then you can reject the null
hypothesis; otherwise, you do not reject it

Nguyen Thu Hang, BMNV, FTU CS2 12


7. 3. Testing the Equality of Two Regression Coefficients

• review
 2 . x32 2
Var ( ˆ2 )  
x x2
2
2
3  ( x2 x3 ) 2
(1  r ) x
2
2,3
2
2

ˆ
Var (  3 ) 
 . 2
x 2
2

 2

 x2  x3  ( x2 x3 ) (1  r2,3 ) x3
2 2 2 2 2

 r 2 2

Cov( ˆ2 ˆ3 )   ( ˆ2 ˆ3 )  2,3

(1  r )
2
2,3 x x 2
2
2
3

13
Example- Stata output
• Model: wage = f(educ,exper, tenure )
. reg wage educ exper tenure

Source SS df MS Number of obs = 526


F( 3, 522) = 76.87
Model 2194.1116 3 731.370532 Prob > F = 0.0000
Residual 4966.30269 522 9.51398984 R-squared = 0.3064
Adj R-squared = 0.3024
Total 7160.41429 525 13.6388844 Root MSE = 3.0845

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5989651 .0512835 11.68 0.000 .4982176 .6997126


exper .0223395 .0120568 1.85 0.064 -.0013464 .0460254
tenure .1692687 .0216446 7.82 0.000 .1267474 .2117899
_cons -2.872735 .7289643 -3.94 0.000 -4.304799 -1.440671
14
Example- Stata output
• Model: wage = f(educ,exper, tenure )

. estat vce

Covariance matrix of coefficients of regress model

e(V) educ exper tenure _cons

educ .00263
exper .00019406 .00014537
tenure -.0001254 -.00013218 .00046849
_cons -.03570219 -.0042369 .00143314 .53138894

15
Example- Stata output
• We have se( ˆ3  ˆ 4 )  0.029635
t = -4.958 t 0.025,522  2.
Reject H0

16
8.3. Testing the Equality of Two Regression Coefficients

Option 2: F-test
• If the F variable exceeds the critical F value at the designated
level of significance for given df, then you can reject the null
hypothesis; otherwise, you do not reject it

2
 ( ˆ3  ˆ 4 )  (  3   4 ) 
F1,n  k  
 ˆ  ˆ ) 
 se ( 3 4 

17
Example
• F=24.58
• F(0.05,1,522) = 3.85

•  We reject the hypothesis that the two effects are


equal.

18
8.3. Testing the Equality of Two Regression Coefficients

Option 3: Stata output F-test

. test exper=tenure
( 1) exper - tenure = 0

F( 1, 522) = 24.58
Prob > F = 0.0000
 We reject the hypothesis that the two effects are
equal.

Nguyen Thu Hang, BMNV, FTU CS2 19


8.4. Restricted Least Squares: Testing Linear Equality
Restrictions LEC 13

• Now consider the Cobb–Douglas production function:


2 3 U i
Yi  1 X X e
2i 3i
8.6.1
where Y = output
X2 = labor input
X3 = capital input
• Written in log form, the equation becomes
ln Yi = ln β1 + β2 lnX2i + β3lnX3i + ui
= β0 + β2lnX2i + β3lnX3i + ui 8.6.2
where β0 = ln β1.

20
8.4. Restricted Least Squares: Testing Linear Equality
Restrictions
• Now if there are constant returns to scale (equiproportional
change in output for an equiproportional change in the
inputs), economic theory would suggest that:
β2 + β3 = 1 (8.6.3)
which is an example of a linear equality restriction.
• If the restriction (8.6.3) is valid? There are two approaches:
• The t-Test Approach
• The F-Test Approach

21
8.4. Restricted Least Squares: Testing Linear Equality
Restrictions
The t-Test Approach
•The simplest procedure is to estimate Eq. (8.6.2) in the usual
manner
•A test of the hypothesis or restriction can be conducted by the
t test.

(8.6.4)

•If the t value computed exceeds the critical t value at the chosen
level of significance, we reject the hypothesis of constant returns
to scale;
•Otherwise we do not reject it.
22
8.4. Restricted Least Squares: Testing Linear Equality
Restrictions
The F-Test Approach
•we see that β2 = 1 − β3
•we can write the Cobb–Douglas production function as
lnYi = β0 + (1 − β3) ln X2i + β3 ln X3i + ui
= β0 + ln X2i + β3(ln X3i − ln X2i ) + ui
or (lnYi − lnX2i) = β0 + β3(lnX3i − lnX2i ) + ui (8.6.7)
or ln(Yi/X2i) = β0 + β3ln(X3i/X2i) + ui (8.6.8)
Where (Yi/X2i) = output/labor ratio
(X3i/X2i) = capital/labor ratio
Eq. (8.6.7) or Eq. (8.6.8) is known as restricted least squares
(RLS)
23
8.4. Restricted Least Squares: Testing Linear Equality
Restrictions
• We want to test the hypotheses
H0: β2 + β3 = 1 (the restriction H0 is valid)

RSSUR: RSS of the unrestricted regression (8.6.2)


RSSR : RSS of the restricted regression (8.6.7) or (8.6.7)
m = number of linear restrictions (1 in the present example)
k = number of parameters in the unrestricted regression
n = number of observations
• If the F value computed > the critical F value at the chosen level of
significance, we reject the hypothesis H0

24
EXAMPLE 8.3 The Cobb–Douglas Production Function for the
Mexican Economy,1955–1974 (Table 8.8)
GDP Employment Fixed Capital
Year Millions of 1960 pesos. Thousands of people. Millions of 1960 pesos.
1955 114043 8310 182113
1956 120410 8529 193749
1957 129187 8738 205192
1958 134705 8952 215130
1959 139960 9171 225021
1960 150511 9569 237026
1961 157897 9527 248897
1962 165286 9662 260661
1963 178491 10334 275466
1964 199457 10981 295378
1965 212323 11746 315715
1966 226977 11521 337642
1967 241194 11540 363599
1968 260881 12066 391847
1969 277498 12297 422382
1970 296530 12955 455049
1971 306712 13338 484677
1972 329030 13738 520553
1973 354057 15924 561531 25
1974 374977 14154 609825
Example

Fk-1,n-k,α = F2,17,0.05 = 3.59

26
Example

• F = 3.75 < Fm,n-k,α = F1,17,0.05 = 4.45 we can not reject H0.

27
A Cautionary Note (p.269)
• Keep in mind that if the dependent variable in the
restricted and unrestricted models is not the same,
R2(unrestricted) and R2(restricted) are not directly
comparable.

28
Example
• General F Testing
• Model: wage = f(educ,exper, tenure )
H0: beta(exper)=0

29
Example
• Unrestricted model
. reg wage educ exper tenure

Source SS df MS Number of obs = 526


F( 3, 522) = 76.87
Model 2194.1116 3 731.370532 Prob > F = 0.0000
Residual 4966.30269 522 9.51398984 R-squared = 0.3064
Adj R-squared = 0.3024
Total 7160.41429 525 13.6388844 Root MSE = 3.0845

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5989651 .0512835 11.68 0.000 .4982176 .6997126


exper .0223395 .0120568 1.85 0.064 -.0013464 .0460254
tenure .1692687 .0216446 7.82 0.000 .1267474 .2117899
_cons -2.872735 .7289643 -3.94 0.000 -4.304799 -1.440671

30
Example
• Restricted model
. reg wage educ tenure

Source SS df MS Number of obs = 526


F( 2, 523) = 113.07
Model 2161.4496 2 1080.7248 Prob > F = 0.0000
Residual 4998.96469 523 9.55824989 R-squared = 0.3019
Adj R-squared = 0.2992
Total 7160.41429 525 13.6388844 Root MSE = 3.0916

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5691434 .0488056 11.66 0.000 .4732644 .6650225


tenure .1895808 .0187064 10.13 0.000 .1528319 .2263297
_cons -2.221624 .640154 -3.47 0.001 -3.479213 -.9640351

31
Example

General F Testing
•In Exercise 7.19, you were asked to consider the following
demand function for chicken:
lnYt = β1 + β2 lnX2t + β3 lnX3t + β4 lnX4t + β5 ln X5t + ui (8.6.19)
Where Y = per capita consumption of chicken, lb
X2 = real disposable per capita income,$
X3 = real retail price of chicken per lb
X4 = real retail price of pork per lb
X5 = real retail price of beef per lb.

32
Example

• Suppose that chicken consumption is not affected by the prices of pork


and beef.
H0: β4 = β5 = 0 (8.6.21)
Therefore, the constrained regression becomes
lnYt = β1 + β2 ln X2t + β3 lnX3t + ut (8.6.22)

33
Example

• F=1.1224 < F0.05 (2,18) = 3.55. Therefore, there is no reason to reject


the null hypothesis (the demand for chicken does not depend on pork
and beef prices)
• In short, we can accept the constrained regression (8.6.24) as
representing the demand function for
Nguyen Thu Hang, chicken.
BMNV, FTU CS2 34
8.5. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• Now we have three possible regressions:Regression (8.7.3) assume
Time period 1970–1981: Yt = λ1 + λ2Xt + u1t (8.7.1)
Time period 1982–1995: Yt = γ1 + γ2Xt + u2t (8.7.2)
Time period 1970–1995: Yt = α1 + α2Xt + ut (8.7.3)
• there is no difference between the two time periods. The mechanics of
the Chow test are as follows:
1. Estimate regression (8.7.3), obtain RSS3 with df = (n1 + n2 − k)
We call RSS3 the restricted residual sum of squares (RSSR) because it is obtained by
imposing the restrictions that λ1 = γ1 and λ2 = γ2, that is, the subperiod regressions are
not different.
2. Estimate Eq. (8.7.1) and obtain its residual sum of squares, RSS1,
with df = (n1 − k).
3. Estimate Eq. (8.7.2) and obtain its residual sum of squares, RSS2,
with df = (n2 − k). 35
8.5. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
4. The unrestricted residual sum of squares (RSSUR), that is,
RSSUR = RSS1 + RSS2 with df = (n1 + n2 − 2k)
5. F ratio:

6. If the computed F value exceeds the critical F value, we reject the


hypothesis of parameter stability conclude that the regressions (8.7.1)
and (8.7.2) are different

36
TABLE 8.9 Savings and Personal Disposable Income (billions of dollars),
United States, 1970–1995

Observation Savings Income Observation Savings Income


1970 61.00 727.1 1983 167.0 2,522.4
1971 68.67 790.2 1984 235.7 2,810.0
1972 63.62 855.3 1985 206.2 3,002.0
1973 89.65 965.0 1986 196.5 3,187.6
1974 97.64 1,054.2 1987 168.4 33,363.1
1975 104.41 1,159.2 1988 189.1 3, 3640.8
1976 96.48 1,273.0 1989 187.8 33,894.5
1977 92.57 1,401.4 1990 208.7 44,166.8
1978 112.64 1,580.1 1991 246.4 4, 4343.7
1979 130.16 1,769.5 1992 272.6 4, 4613.7
1980 161.84 1,973.3 1993 214.4 4, 4790.2
1981 199.14 2,200.2 1994 189.4 5, 5021.7
1982 205.53 2,347.3 1995 249.3 5, 5320.8 37
8.5. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• For the data given in Table 8.9, the empirical counterparts of the
preceding three regressions are as follows:

38
8.5. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• RSSUR = RSS1 + RSS2 = (1785.032 + 10,005.22) = 11,790.252
• RSSR = RSS3 = 23248.3

= 10.69
• From the F tables, we find that for 2 and 22 df the 1 percent critical F
value is 5.72.
• Therefore, The Chow test therefore seems to support our earlier hunch
that the savings–income relation has undergone a structural change in
the United States over the period 1970–1995

39
8.6. Testing the Functional Form of Regression:
Choosing between Linear and Log–Linear Models
• We can use a test proposed by MacKinnon, White, and Davidson,
which for brevity we call the MWD test, to choose between the two
models
H0: The true model is linear
H1: The true model is Log–Linear
Step I: Estimate the linear model and obtain the estimated Y values. Call them Yf
Step II: Estimate the log–linear model and obtain the estimated lnY values; call lnf
Step III: Obtain Z1 = (lnY f − ln f ).
Step IV: Regress Y on X’s and Z1 obtained in Step III. Reject H0 if the coefficient of Z1
is statistically significant by the usual t test.
Step V: Obtain Z2 = (antilog of lnf − Y f ).
Step VI: Regress log of Y on the logs of X’s and Z2. Reject H1 if the coefficient of Z2 is
statistically significant by the usual t test.

40
EXAMPLE 8.5 The Demand for Roses

• Refer to Exercise 7.16 where we have presented data on the demand


for roses in the Detroit metropolitan area for the period 1971–III to
1975–II.
Linear model: Yt = α1 + α2X2t + α3X3t + ut (8.10.1)
Log–linear model: lnYt = β1 + β2lnX2t + β3lnX3t + ut (8.10.2)
• Where Y is the quantity of roses in dozens
X2 is the average wholesale price of roses ($/dozen),
X3 is the average wholesale price of carnations ($/dozen).
• A priori: α2 and β2 are expected to be negative (why?)
α3 and β3 are expected to be positive
• As we know, the slope coefficients in the log–linear model are
elasticity coefficients.
41
EXAMPLE 8.5 The Demand for Roses
• Step I, Step II

• Step III: Obtain Z1 = (lnYf − ln f ).


• Step IV:

42
EXAMPLE 8.5 The Demand for Roses

• The coefficient of Z1 is not statistically significant (t test), we do not


reject the hypothesis that the true model is linear
• Step V: Obtain Z2 = (antilog of lnf − Y f ).
• Step VI:

• The coefficient of Z2 is not statistically significant (t test), we also can


reject the hypothesis that the true model is log–linear at 5% level of
significance
• Conclusion: As this example shows, it is quite possible that in a given
situation we cannot reject either of the specifications.
43
Assignments
• Problems 7.16, 7.17, 7.18, 7.19, 7.20 in p25-240, Gujarati.
• Problems 3.1-3.3 in p. 105-106, Wooldridge.
• Computer exercises C3.1-C3.3 in p. 110-111, Wooldridge.

44
The Log-Linear Model

 Consider the following model, known as the exponential


regression model:
2
Yi  1 X i e ui

 Which may be expressed alternatively as


(6.5.2)
ln Yi  ln 1   2 ln X 1  ui
denote Yi* = lnYi Xi* = lnXi α = lnβ1
We write Eq. (6.5.2) as:
Yi* = α + β2 X i* + ui
β2 measures the ELASTICITY of Y respect to X, that is, percentage
change in Y for a given (small) percentage change in X.
45
Semilog Models: Log–Lin and Lin–Log Models

• EXAMPLE 6.4 The rate of growth expenditure on services


Consider the data on expenditure on services given in Table
6.3. The regression results over time (t) are as follows:

Over the quarterly period 2003 to2006, expenditures on


services increased at the (quarterly) rate of 0.705 percent.

46
Semilog Models: Log–Lin and Lin–Log Models

Linear Trend Model


• Sometimes estimate the following model:
Yt = β1 + β2t + ut (6.6.9)
• For the expenditure on services data, the results of fitting the
linear trend model (6.6.9) are as follows:

• On average, expenditure on services increased at the absolute


rate of about 30 billion dollars per quarter.

47
Semilog Models: Log–Lin and Lin–Log Models

The Lin–Log Model


• Suppose we now want to find the absolute change in Y for a
percent change in X. A model that can accomplish this
purpose can be written as:
Yi = β1 + β2 ln Xi + ui (6.6.11)
For descriptive purposes we call such a model a lin–log
model.

48
Semilog Models: Log–Lin and Lin–Log Models

• Example 6.5:let us revisit our example on food expenditure


in India, Example 3.2. As this figure suggests, food
expenditure increases more slowly as total expenditure
increases. The results of fitting the lin–log model to the data
are as follows:

The slope coefficient of about 257 means that an increase in


the total food expenditure of 1 percent, on average, leads to
about 2.57 rupees increase in the expenditure on food of the
55 families included in the sample.

49

You might also like