Unit 4 Multiple Regressions

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

MULTIPLE REGRESSION

- ESTIMATION AND
HYPOTHESIS TESTING
STUDY UNIT 4

PROF TJ MOSIKARI
Learning outcomes
• Define or describe key terms and concepts such as partial regression
coefficient, R2, adjusted R2, specification error
• estimate a multiple regression model in EViews;
• interpret the partial regression coefficients;
• interpret the R2 and adjusted R2
Introduction
• Simple regression involved two variables: the dependent
• variable, Y, and the explanatory variable, X. As we discussed prior.
• Most empirical economic questions involve many variables. Multiple
regression extends simple regression to the case where there are
many explanatory variables.
• Since it is the most common tool used in applied economics, this
chapter is very important. Fortunately, most of the intuition and
statistical techniques of multiple regression are very similar to those
of simple regression
Example: Explaining house
prices
• Much research in applied microeconomics and marketing focuses on
the pricing of goods. One common approach involves building a
model in which the price of a good depends on the characteristics of
that good.
• the price of a house is affected by more than just lot size. Any serious
attempt to explain the determinants of house prices must include
explanatory variables other than the lot size
In this example, we focus on the following four
explanatory variables:

• X1 = the lot size of the property (in square feet)


• X2 = the number of bedrooms
• X3 = the number of bathrooms
• X4 = the number of storeys (excluding the basement).
Regression as a Best Fitting
Line
• the simple regression model can be thought of as a technique aimed at
fitting a line through an XY-plot.
• Since multiple regression implies the existence of more than two
variables (e.g. X1, X2, X3, X4 and Y), we cannot draw an XY-plot on a
two-dimensional graph, in which one variable is plotted on the vertical
axis and the other on the horizontal axis.
• Nevertheless, the same line-fitting intuition holds, if we have three
explanatory variables, we could show how multiple regression involves
fitting a line through a four-dimensional graph, in which Y is plotted on
one axis, X1 on the second, X2 on the third and X3 on the fourth.
OLS Estimation of the Multiple
Regression Model
• The multiple regression model with k explanatory
variables is written as:

• Instead of estimating just α and β, we now have α and


β , β , . . ., β . However, the strategy for finding
1 2 k

estimates for all these coefficients is exactly the same


as for the simple regression model. That is, we define
the sum of squared residuals:
• That is, we define the sum of squared residuals:

• where X is the ith observation on the first explanatory


1i

variable The other explanatory variables are defined in an


analogous way. The OLS estimates (which can be interpreted
as providing the best fitting line) are found by choosing the
values of an that minimize the SSR. Conceptually, this is a
straightforward mathematical problem. software packages will
calculate these OLS estimates automatically.
Interpreting OLS Estimates
• It is in the interpretation of OLS estimates that some subtle (and
important) distinctions exist between the simple and multiple
regression cases.
• In the simple regression case, we saw how β could be interpreted as a
marginal effect (i.e. as a measure of the effect that a change in X has
on Y or as a measure of the influence of X on Y).
• In multiple regression, βj can still be interpreted as a marginal effect,
but in a slightly different way. In particular, βj is the marginal effect of
X on Y when all other explanatory variables are held constant
Example: regression output
• Alternatively, let us consider (the coefficient on the
number of bedrooms), which is 2824.61.
• This might be expressed as:
• Houses with an extra bedroom tend to be worth
$2824.61 more than those without the extra bedroom,
ceteris paribus.
• If we consider comparable houses (e.g. those with a
5000 square foot lot, two bathrooms and two storeys),
those with three bedrooms tend to be worth $2824.61
more than those with two bedrooms.
• in a discussion of the statistical properties of the regression
coefficients, the confidence interval and the P-value are the most
important numbers.
• They can be interpreted in the same way as for the simple regression.
For instance, since the P-values for all of the explanatory variables
(except the intercept) are less than 0.05
• we can say that “the coefficients β1, β2, β3 and β4 are statistically
significant at the 5% level” or, equivalently, that “we can reject the
four separate hypotheses that any of the coefficients is zero at the 5%
level of significance”.
Omitted Variables Bias
• if we omit explanatory variables that should be present in the
regression and if these omitted variables are correlated with
explanatory variables that are included, then the coefficients on the
included variables will be wrong.
• One practical consequence of the omitted variables bias is that you
should try to include all explanatory variables that could affect the
dependent variable.
• Unfortunately, in practice, this is rarely possible.
• However, there is a counter argument to be made for using as few
explanatory variables as possible. It can be shown that the inclusion
of irrelevant variables decreases the accuracy of the estimation of all
the coefficients (even the ones that are not irrelevant).
• How should we trade off the benefits of including many variables with
the costs of including irrelevant variables.
• A common practice is to begin with as many explanatory variables as
possible, then discard those that are not statistically significant (and
re-run the regression with the new set of explanatory variables).
Statistical significance.
Multicollinearity
• Multicollinearity is a statistical issue that relates to the previous
discussion. It is a problem that arises if some or all of the explanatory
variables are highly correlated with one another.
• If multicollinearity is present, the regression model has difficulty telling
which explanatory variables are influencing the dependent variables.
• A multicollinearity problem reveals itself through low t-statistics and,
therefore, high P-values.
• In an extreme case, it is possible for you to find that all the coefficients
are insignificant using t-statistics, while R2 is quite large and significant.
• There is not too much that can be done to correct this problem other
than to drop some of the highly correlated variables from the
regression.
• However, there are many cases when you would not want to do so.
For instance, in our house price example, if the number of bedrooms
and the number of bathrooms had been found to be highly
correlated, multicollinearity would be a problem.
• But you may hesitate to throw out one of these variables since
common sense indicates that both of them influence housing prices.
Chapter Summary
1. The multiple regression model is very similar to the simple
regression model. This chapter emphasizes differences between
the two.
2. The interpretation of regression coefficients is subject to ceteris
paribus conditions: βj measures the marginal effect of Xj on Y,
holding the other explanatory variables constant.
3. If important explanatory variables are omitted from the
regression the estimated coefficients can be misleading, a
condition known as “omitted variables bias”. The problem gets
worse if the omitted variables are strongly correlated with the
included explanatory variables.
4. If the explanatory variables are highly correlated with one
another, coefficient estimates and statistical tests may be
misleading. This is referred to as the “multicollinearity” problem.
Class exercise
• Given the data specify and estimate a model econometrically
• Based on the model specified, what is prior expected relationship and the
variables
• Is government expenditure a stimulus factor to growth? If Yes or No explain
• Is all the fitted parameters make economic sense? Interpret.
• Perform the 5% test of significant on b_1 when b_1=0 and interpret
• Perform the 5% confidence interval of significance on b_2 when b_2=0 and
interpret.
• Test the overall significance of multiple regression in terms of R^2 and
interpret

You might also like