Regression With Dummy Independent Variables
Regression With Dummy Independent Variables
Regression With Dummy Independent Variables
ANOVA: Example 1
Yi 1 D1i 2 D2 i ui Possibilities
(a) Comparison between rural and urban households
Yi = Monthly per capita consumption
expenditure (1) Β1 is not statistically significant => average MPCE of urban households is not
Household from three different types significantly different from that of rural households
of locations (2) Β1 is statistically significant and positive => average MPCE of urban
D1i = 1 if the household is in urban households is significantly higher than that of rural households
area (3) Β1 is statistically significant and negative => average MPCE of urban
D1i = 0 otherwise households is significantly lower than that of rural households
D2i = 1 if the household is in semi-
urban area (b) Comparison between rural and semi-urban households
D2i = 0 otherwise
Base category = Households from (1) Β2 is not statistically significant => average MPCE of semi-urban households
rural area is not significantly different from that of rural households
(2) Β2 is statistically significant and positive => average MPCE of semi-urban
households is significantly higher than that of rural households
(3) Β2 is statistically significant and negative => average MPCE of semi-urban
households is significantly lower than that of rural households
ANOVA: Example 2
Yi 1 D1i 2 D2i 3 ( D1i * D2i ) ui (4) If D1i = 1, D2i = 1 => E Yi 1 2 3 (Urban, Male Head)
Some Possibilities
Yi = Monthly per capita consumption (a) Differential impact of being in urban area (with female head)
expenditure (1) Β1 is not statistically significant => average MPCE of urban households
D1i = 1 if the household is in urban area female head is not significantly different from that of rural households
D1i = 0 otherwise (2) Β1 is statistically significant and positive => average MPCE of urban
D2i = 1 if the household head is male households with female head is significantly higher than that of others
D2i = 0 otherwise (3) Β1 is statistically significant and positive => average MPCE of urban
Base category = Households from rural households with female head is significantly lower than that of others
area with female head
(b) Differential impact of having male household head (in rural areas)
(1) Β2 is not significant => average MPCE of rural households with male head
is not significantly different from those with female head
(2) Β2 is significant and positive => average MPCE of rural households with
male head is significantly higher than those with female head
(3) Β2 is significant and negative => average MPCE of rural households with
male head significantly lower than those with female head
ANOVA: Example 3
Yi = Monthly per capita consumption (1) If both β1 and β3 are not significant, the two PRFs will coincide
expenditure
Xi = Monthly income of the household (2) If β1 is significant and β3 not, the two PRFs will be parallel (difference
Household from two different types of will be only in respect of intercept – autonomous consumption)
location – rural and urban (a) If β1 is positive, PRF for urban households will have higher intercept
D1i = 1 if the household is in urban area (b) If β1 is negative, PRF for urban households will have lower intercept
D1i = 0 otherwise
Base category = Households from rural area (3) If β3 is significant and β1 not, the two PRFs will be concurrent
(difference will be only in respect of slope – induced consumption)
(a) If β3 is positive, PRF for urban households will be steeper
(b) If β3 is negative, PRF for urban households will be flatter
(4) If both β1 and β3 are significant, the two PRFs will be dissimilar
(a) If both β1 and β3 positive, PRF for urban households will be steeper
with a higher intercept
(b) If both β1 and β3 negative, PRF for urban households will be flatter
with a lower intercept
(c) If β1 is positive but and β3 is negative, PRF for urban households will
be flatter with a higher intercept
(d) If β1 is negative but and β3 is positive, PRF for urban households will
be steeper with a lower intercept
ANCOVA: Example 2
How will the results and the interpretation of the coefficients differ?
Empirical Example:
Model I
Model II
Not much difference in R2 between the two models (to be tested statistically)
More variables turnout to be statistically significant in Model II
No change in sign of the statistically significant coefficients
Finally, selection of Model II for further discussions
Interpretation of the Results
Coefficient of RURAL is significant and positive => Average MPCE of rural households is higher
than that of urban households.
Coefficient of SEMIURBAN is not significant => Average MPCE of semi-urban households is not
significantly different from that of the urban households.
Coefficient of LNHHSIZE is statistically significant and negative => Households with more
members in the family have lower average MPCE.
Coefficient of PCLAND is statistically significant and positive => Households with more
landholding per member in the family have higher average MPCE.
Coefficient of BPL is not statistically significant=>Average MPCE does not differ depending on
whether households below to the BPL category
Steps to be followed:
Use of dummy for every quarter (without intercept) or for three quarters (with
intercept) treating the omitted quarter as the base or reference
Estimation of the residuals – deseasonalized values of the time-series
Important Questions:
Consider the function: GDP = f(GFCF). How to account for seasonality in GFCF, if any?
Frisch-Waugh Theorem: Use of dummy variables in GDP = f(GFCF) will deseasonalize both GDP
and GFCF
Running regression of GDP on the dummy variables and estimating the residuals (U1)
Running regression of GFCF on the dummy variables and estimating the residuals (U2)
Regressing U1 on U2 – Same coefficient of GFCF as one gets when GDP is regressed on
GFCF and the dummy variables
Model I Model II
Obs. Predicted GDP Residuals Obs. Predicted GDP Residuals Obs. Predicted GDP Residuals Obs. Predicted GDP Residuals
1 14.824 -0.21485 15 14.864 -0.00601 1 14.824 -0.21485 15 14.864 -0.00601
2 14.832 -0.22893 16 14.923 0.003556 2 14.832 -0.22893 16 14.923 0.003556
3 14.864 -0.19649 17 14.824 0.084883 3 14.864 -0.19649 17 14.824 0.084883
4 14.923 -0.20196 18 14.832 0.086258 4 14.923 -0.20196 18 14.832 0.086258
5 14.824 -0.15017 19 14.864 0.066738 5 14.824 -0.15017 19 14.864 0.066738
6 14.832 -0.15386 20 14.923 0.07161 6 14.832 -0.15386 20 14.923 0.07161
7 14.864 -0.13404 21 14.824 0.143054 7 14.864 -0.13404 21 14.824 0.143054
8 14.923 -0.146 22 14.832 0.151753 8 14.923 -0.146 22 14.832 0.151753
9 14.824 -0.07783 23 14.864 0.140815 9 14.824 -0.07783 23 14.864 0.140815
10 14.832 -0.07596 24 14.923 0.149779 10 14.832 -0.07596 24 14.923 0.149779
11 14.864 -0.07551 25 14.824 0.219563 11 14.864 -0.07551 25 14.824 0.219563
12 14.923 -0.08343 26 14.832 0.219427 12 14.923 -0.08343 26 14.832 0.219427
13 14.824 -0.00465 27 14.864 0.204511 13 14.824 -0.00465 27 14.864 0.204511
14 14.832 0.001316 28 14.923 0.206451 14 14.832 0.001316 28 14.923 0.206451
Testing for Structural Break
(i) D1 = 1 for the second sub-period, and D1=0 for the first sub-period
(ii) D2 = X for observations corresponding to the second sub-period, and D 2=0 for observations
corresponding to the first sub-period
(iii) D3 = Z for observations corresponding to the second sub-period, and D3=0 for observations
corresponding to the first sub-period
Restrictions:
Elimination of Dummy
All coefficients are the same 1 2 ; 1 2 ; 1 2 D1, D2, D3
Only intercepts change: 1 2 ; 1 2 D2, D3
Only coefficients of X change: 1 2 ; 1 2 D1, D3
Only coefficients of Z change: 1 2 ; 1 2 D1, D2
Only slopes change: 1 2 D1
Only intercepts and coefficients of X change: 1 2 D3
Only intercepts and coefficients of Z change: 1 2 D2
All the slope coefficients and the intercept change None (Full Model)