Multiple Regression
Multiple Regression
Multiple Regression
Regression Models in
Business Research
Dr Prabir K. Das
Indian Institute of Foreign Trade
The Question
Reliable Motors, Inc., a manufacturer and
marketer of electric motors would like to
build a predictive model consisting of
several variables, to predict sales. Past
data on sales and six different variables,
namely, market potential in the territory (in
Rs. Lakh), number of dealers of the
company in the territory, number of sales
persons in the territory,
The Question
index of competitor activity in the territory
on a five-point scale (1=lowest, 5= highest
level of activity by competitors), number of
service people in the territory, and number
of existing customers in the territory are
available. It is believed that these variables
along with other variables influence sales.
How to develop a predictive model?
Learning Objectives
To develop a multiple linear regression
model.
To understand the assumptions underlying
development of multiple linear regression
model.
Understand the usefulness of residual
analysis.
Cautionary comments.
Introduction
Regression analysis is the statistical
methodology for predicting values of one or
more response (dependent) variables from a
collection of predictor (independent) variable
values.
It can also be used for assessing the effects
of the predictor variables on the responses.
The name regression is in no way reflects
either the importance or breadth of
application of this methodology.
The Model
Following regression model can be
hypothesized for the population:
Y=b0+b1X1+b2X2++bkXk+u
Where,
Y = Dependent variable
X1 to Xk = k independent variables
b0 = a model parameter that represents the
mean value of the dependent variable (Y)
when the value of the independent variable X
is zero (it is also called the Y intercept)
The Model
b1 to bk = k parameters (partial regression or
partial slope coefficients); It measures the
change in the mean value of the dependent
variable associated with one-unit change in
the value of the independent variable with
other variables being constant.
The parameter b1 gives the direct or net
effect of a unit change in X1 on the mean
value of Y, net of any effect that X2Xk may
have on mean Y.
The Model
u = an error (disturbance or
uncertainty) term that describe the
effects on Y of all variables other than
the already selected X variables.
The uncertainty/error term is central to
the model.
Estimation of Parameters of
the Model
Parameters of the models are estimated
using least square technique.
Important properties
The regression line passes through the
point of means.
The residuals have zero covariance
with the sample X values and also with
the predicted Y values.
Important properties
The total variation in Y may be
expressed as the sum of just two
components, the variation explained
by the linear regression and the
variation unexplained by the
regression.
Multiple R
Correlation coefficient between Y and
Pred Y.
R2
The measure of regression models
ability to predict is called the
coefficient of determination (R2).
It is the ratio of the explained variation
to the total variation.
Coefficient of Determination
R2 (Contd.)
Range 0 to 1
Interpretation: In percentage term : x%
of the total variability present in the
data is being explained by the
regression model.
Adjusted R-square
2
adj
ErrorSS
n ( k + 1)
= 1
TotalSS
n 1
Cautionary Comments
Prediction using extreme values of the
independent variable (beyond the range
of X variables) can be risky.
Linearity assumption may be
appropriate for only a limited range of
the independent variables.
Random sample provides no
information about extreme values of
independent variables.
Cautionary Comments
The data from the random sample were
obtained under a set of environmental
conditions; if they change, the model
may well be affected.
If the market environment changes, the
model parameters probably will be
affected.
Enter
A procedure for variable selection in which
all variables in a block are entered in a
single step.
Stepwise
At each step, the independent variable not
in the equation that has the smallest
probability of F is entered, if that
probability is sufficiently small.
Variables already in the regression
equation are removed if their probability of
F becomes sufficiently large. The method
terminates when no more variables are
eligible for inclusion or removal.
Case Study
In recent years, many US firms have intensified
their efforts to market their products in the Pacific
Rim. Among the major economic powers in that
area are Japan, Hong Kong, and Singapore.
A consortium of US firms that produce raw
materials used in Singapore is interested in
predicting the level of exports from the United
States to Singapore, as well as understanding
the relationship between US exports to
Singapore and certain variables affecting the
economy of that country.
Thank You