Multiple Regression Analysis
Multiple Regression Analysis
Multiple Regression Analysis
Where
y = dependent variable.
x1, x2, x3 …………… xk are independent variables.
0 = y-intercept.
1 = Slope of y with variable x1 holding the remaining variables x2, x3 …,xk constant or
Regression coefficient of y on x1 holding the remaining variables x2, x3 ……………
xk constant.
2 = Slope of y with variable x2 holding the remaining variables x1, x3 …,xk constant or
Regression coefficient of y on x2 holding the remaining variables x1, x3 ……………
xk constant.
3 = Slope of y with variable x3 holding the remaining variables x1, x2, x4, …,xk
constant or Regression coefficient of y on x3 holding the remaining variables x1, x2, x4,
…,xk constant.
And so on. Similarly,
k = Slope of y with variable xk holding the remaining variables x1, x2, x3 …,xk-1
constant or Regression coefficient of y on xk holding the remaining variables x1, x2, x3
…,xk-1 constant.
e = error term or residual.
19
Multiple Regression Analysis Madhab Bhatta
Where
y = dependent variable.
x1and x2 are independent variables.
0 = y-intercept.
1 = Slope of y with variable x 1 holding the remaining variable x2 constant or Regression
coefficient of y on x1 holding the remaining variable x2 constant.
2 = Slope of y with variable x 2 holding the remaining variable x1 constant or Regression
coefficient of y on x2 holding the remaining variable x1 constant.
e = error term or residual.
To fit the regression model (2), we have to estimate the value of 0, 1 and 2. To
estimate the value of these parameters we use the principal of lest square. The following
three normal equation of equation (2) can be obtain using the method of lest square.
Now normal equations of equation (2) are
On solving these above three normal equations we can estimate the value of b 0, b1 and b2.
Hence the fitted multiple regression model is
Where,
Estimated value of the dependent variable for a given values of the independent
variables.
b0 = y-intercept (or Estimated value of 0.)
b1 = Regression coefficients of y on x1 holding the effect of x2 constant (or Estimated
value of 1.)
b2 = Regression coefficients of y on x2 holding the effect of x1 constant (or Estimated
value of 2.)
a. The y-intercept b0 represent the average of the dependent variable when the value
of independents variables are zero i.e. x1 = x2 =0. for example
Here b0 = 20, this means, the average value of dependent variable is 20
when x1 = x2 =0
b. The multiple regression coefficients b1 measure the average rate of increased or
decreased in the value of dependent variable (y) while increased the value of
20
Multiple Regression Analysis Madhab Bhatta
Where
e = Error term
= Observed value of the dependent variable.
= Estimated value of the dependent variable for a given values of set of independent
variable.
Mathematically,
Total Sum of Square (SST) = Regression Sum of Square (SSR) + Error Sum of
Square (SSE) i.e.
Where,
21
Multiple Regression Analysis Madhab Bhatta
Where ‘k’ is the number of independents variables. The other notations have their
usual meanings.
The interpretation of the standard error of the estimate is already discussed in the
simple regression model.
Where
r2 = coefficient of multiple determination.
r2adj = adjusted coefficients of multiple determination.
k= number of independent variables.
n = number of pairs of data.
SST = Total sum of square.
SSE = Sum of square due to error.
SSR = Sum of square due to regression.
22
Multiple Regression Analysis Madhab Bhatta
By putting the value of j = 1, 2, 3, 4… k. in the above relation, we can obtain the interval
estimation of 1, 2, 3, 4, …………….,k respectively.
Where
bj = Estimated value of j or Regression coefficients of dependent variable (y) with
independent variable (xj) holding the effect of other independent variables constant.
= Standard error of the regression coefficients bj.
n = number of pairs of data.
k = number of independent variables.
= Tabulated value of‘t’ obtained from two tailed student’s t-table at (n-k-1)
degree of freedom and ‘’ level of significance.
= Estimated value of the dependent variable for a given values of set of independent
variables.
The other notations have their usual meaning.
If null hypothesis is accepted then you can conclude that there is no relationship between
dependent and independent variables. But if alternative hypothesis is accepted then you
can conclude that there is a significant relationship between dependent and independent
variables.
23
Multiple Regression Analysis Madhab Bhatta
Test Statistics:
Decision: if the calculated value of the test statistics ( tcal) is less than tabulated value
(ttab) then null hypothesis is accepted otherwise alternative hypothesis is accepted i.e.
If tcal < t, n-k-1, then null hypothesis is accepted. Otherwise alternative hypothesis is
accepted.
Where
tα,n-k-1 = tabulated value of ‘t’ at (n-k-1) degree of freedom and ‘α’ level of significance,
obtained from two tailed t-table.
n = number of pairs of data.
α= level of significance
bj = Regression coefficients of y on xj.
= Standard error of the regression coefficient (bj)
12 F-Test:
This test is used to test the significance of the multiple regression model. By using this F-
test, we can determine whether there is a significant relationship between the dependent
variable and the set of independent variables i.e. we can determine the over all fit of the
regression model.
The null and alternative hypotheses are set as
Null hypothesis (H0): 1 = 2 = 3 = . . . . . . . . . . . . . . = k (This means, no linear
relationship between the dependent variable and the independent variables)
24
Multiple Regression Analysis Madhab Bhatta
Alternative hypothesis (H1): At least one j 0 (This means, linear relationship exist
between the dependent variable and at least one of the independent variable)
Test statistics
Decision
If Fcal F (k, n-k-1),α then null hypothesis (H0) is accepted otherwise alternative hypothesis
(H1) is accepted.
Note : To obtain the value of F (k , n-k-1),α see the k degree of freedom along horizontal
and n-k-1 degree of freedom along vertical at α level of significance.
When the degree of freedom cannot be found in the table, the closest value on the smaller
side should be used.
13 Dummy Variables:
A qualitative (categorical) variable which can be divided into two categories by assigning
1 and 0 for the first and second category to separate these categories is called dummy
variables. For example, gender is qualitative variables which can be divided in to male
and female categories by assigning 1 for male and 0 for female or 0 for male and 1 for
female.
In dummy variables, we can not assign the number other than 0 and 1 to separate the two
categories of the dummy variables. We assign the number 0 for the absence and 1 for the
presence of the category.
0 = y-intercept.
1 = Regression coefficient of y on x 1 holding the effect of remaining variable x2
constant.
2 = Regression coefficient of dummy variable.
e = error term or residual.
Y = dependent variable.
X1 = independent variable.
25
Multiple Regression Analysis Madhab Bhatta
Regression model if x2 = 0
Regression model if x2 = 1
If b2 = -3, this means, keeping the effect of x 1 constant, in the presence of the attributed
represented by 1, the value of dependent variable is decreased by 3.
Note: the process of assigning the value 0 and 1 for the two categories is totally arbitrary
but affects the sign of the value of the regression coefficient of dummy variables while
interchanging the 0 and 1 value to denote these two categories. But the value to y-
intercept will be changed on interchanging the 0 and 1 to denote these to categories.
26
Multiple Regression Analysis Madhab Bhatta
5. Sam Spade, owner and general manager of the Campus Stationery Store, is
concerned about the sales behavior of a compact cassette tape recorder sold at the
store. He realizes that there are many factors that might help explain sales, but
believes that advertising and price are major determinants. Sam has collected the
following data
Sales (units sold) advertising (no. of ads) price ($)
33 3 125
61 6 115
70 10 140
82 13 130
17 9 145
24 6 140
a. Calculate the least squares equation to predict sales from advertising and price.
b. If advertising is 7 and price is $132, what sales would you predict?
27
Multiple Regression Analysis Madhab Bhatta
7. A developer of food for pigs would like to determine what relationship exists
among the age of a pig when it starts receiving a newly developed food
supplement, the initial weight of the pig and the amount of weight it gains in a 1
week period with the food supplement. The following information is the result of a
study of eight piglets.
Piglet number: 1 2 3 4 5 6 7 8
Initial weight (pounds): 39 52 49 46 61 35 25 55
Initial age (weeks): 8 6 7 12 9 6 7 4
Weight gain: 7 6 8 10 9 5 3 4
a. Calculate the least squares equation that best describe these three variables.
b. How much might we expect a pig to gain in a week with the food supplement if it
were 9 weeks old and weighed 48 pounds?
c. Compute the residual for the pig number 5.
8. Mr. Joshi supervisor of the carven Manufacturing Facility is examining the
relationship among the employee’s score on an aptitude test, prior work experience
and success on the job. An employee’s prior work experience is studies and
weighted, yielding a rating between 2 and 12. The measure of on the job success is
based on a point system involving total output and efficiency with a maximum
possible value of 50. Grant sampled six first year employees and obtained the
following information.
Aptitude test score (X1): 75 87 69 93 81 97
Prior experience (X2): 5 11 4 9 7 10
Performance evaluation (Y): 28 33 21 20 38 46
a. Develop the estimating equation that best describe these data.
b. If an employee scored 83 on the aptitude test and has a prior work experience of
7, what performance evaluation would be expected?
10. For this problem use the following multiple regression equation:
= 50- 2X1 +7X2 and r2 = 0.40
a. Interpret the meaning of the slopes.
b. Interpret the meaning of the y –intercept.
c. Interpret the meaning of the coefficient of multiple determinations.
d. Predict the estimate value of Y when X1 = 4 and X2 = 8
28
Multiple Regression Analysis Madhab Bhatta
29
Multiple Regression Analysis Madhab Bhatta
30
Multiple Regression Analysis Madhab Bhatta
ANOVA
Sum d.f. Mean F
of square
square
Regression ? 3 ? ?
Residual 88.66 ? ?
Total 476.9 9
Coefficients Table
Unstandardized t
coefficients
bi Std.error(Sbi)
Constant 23.6 8.3 ?
X1 0.62 0.39 ?
X2 16.97 7.86 ?
X3 -0.31 0.33 ?
a. Complete above ANOVA table and Coefficients table.
b. Fit a multiple regression model and predict the value of Y when X 1=15,
X2=22 and X3=25
c. Is there any significant relationship between dependent and three
independent variables? (Test at 5% significance level).
d. Test the significance of the estimated regression coefficients of X 2 at the
5% significance level.
e. What proportion of variation in health –care facility utilization index (Y)
is explained by three independent variables?
f. Compute the standard error of estimate and interpret its meaning.
Out put of different software
18. An economist is interested to see how consumption for an economy (in $billions)
is influenced by gross domestic product (GDP in $ billions) and aggregate price
(consumer price index). The Microsoft Excel output of this regression is partially
reproduced below:
SUMMARY OUTPUT
Regression statistics
Multiple R 0.991
R square 0.982
Adjusted R square 0.976
Standard error 0.299
Observations 10
ANOVA
d.f. SS MS F Sig. F
Regression 2 33.4163 16.7082 186 0.0001
Residual 7 0.6277 0.0897
Total 9 34.0440
31
Multiple Regression Analysis Madhab Bhatta
32
Multiple Regression Analysis Madhab Bhatta
Unstandardized Coefficients t
bi Sbi
Anova
Source Sum of Squares Df Mean Square F
Regression ? ? ? ?
Residual 29312 9 ?
Total 620192 12
a. Complete the above table and what is the predicting equation for amounts of
anesthesia?
b. Give an approximate 95% confidence interval for the amount of anesthesia to be
used in 90 minutes operation on a 25 pound dog.
c. Compute coefficient of determination and interpret its meaning.
d. At a significance level of 10%, is the amount of anesthesia needed significantly
different for dogs and cats?
e. At a level of 5%, is this regression significant as a whole? (2009 Spring)
33