Multiple Regression Analysis

Multiple Regression Analysis Madhab Bhatta
Multiple Regression Analysis

1 Introduction
In simple regression analysis, the regression model is fit between one dependent and one
independent variable to estimate the dependent variable or to measure the effect of
independent variable on the dependent variable. When the regression model is fit
between one dependent and more the one independent variables then this regression
model is called multiple regression model. In multiple regression analysis, the effect of
different independent variable on the dependent variable can be measured
simultaneously.
Let ‘y’ be the dependent variable and x1, x2, x3 …………… xk be the ‘k’
independent variables. Then the multiple regression model is defined as
Where
y = dependent variable.
x1, x2, x3 …………… xk are independent variables.
0 = y-intercept.
1 = Slope of y with variable x1 holding the remaining variables x2, x3 …,xk constant or
Regression coefficient of y on x1 holding the remaining variables x2, x3 ……………
xk constant.
2 = Slope of y with variable x2 holding the remaining variables x1, x3 …,xk constant or
Regression coefficient of y on x2 holding the remaining variables x1, x3 ……………
xk constant.
3 = Slope of y with variable x3 holding the remaining variables x1, x2, x4, …,xk
constant or Regression coefficient of y on x3 holding the remaining variables x1, x2, x4,
…,xk constant.
And so on. Similarly,
k = Slope of y with variable xk holding the remaining variables x1, x2, x3 …,xk-1
constant or Regression coefficient of y on xk holding the remaining variables x1, x2, x3
…,xk-1 constant.
e = error term or residual.
2 Multiple Regression Model with Two Independent

Variables:
Let ‘y’ be the dependent variable and x 1 and x2 are two independent variables. The
multiple regression model with two independent variable is defined as,
19
Where
y = dependent variable.
x1and x2 are independent variables.
0 = y-intercept.
1 = Slope of y with variable x 1 holding the remaining variable x2 constant or Regression
coefficient of y on x1 holding the remaining variable x2 constant.
2 = Slope of y with variable x 2 holding the remaining variable x1 constant or Regression
coefficient of y on x2 holding the remaining variable x1 constant.
To fit the regression model (2), we have to estimate the value of 0, 1 and 2. To
estimate the value of these parameters we use the principal of lest square. The following
three normal equation of equation (2) can be obtain using the method of lest square.
Now normal equations of equation (2) are
On solving these above three normal equations we can estimate the value of b 0, b1 and b2.
Hence the fitted multiple regression model is
Where,
Estimated value of the dependent variable for a given values of the independent
variables.
b0 = y-intercept (or Estimated value of 0.)
b1 = Regression coefficients of y on x1 holding the effect of x2 constant (or Estimated
value of 1.)
b2 = Regression coefficients of y on x2 holding the effect of x1 constant (or Estimated
value of 2.)
3 Interpreting the multiple regression coefficients:

Suppose we have the following multiple regression model
a. The y-intercept b0 represent the average of the dependent variable when the value
of independents variables are zero i.e. x1 = x2 =0. for example
Here b0 = 20, this means, the average value of dependent variable is 20
when x1 = x2 =0
b. The multiple regression coefficients b1 measure the average rate of increased or
decreased in the value of dependent variable (y) while increased the value of
20
independent variable ‘x1’ by unit, by keeping the effect of other independent

variable ‘x2’ constant. For example,
Here, b1 = 2.5, this means, the value of dependent variable (y) is increase
by 2.5 when the value of independent variable (x 1) is increase by 1, by keeping
the effect of x2 constant.
c. The multiple regression coefficients b2 measure the average rate of increased or
decreased in the value of dependent variable (y) while increased the value of
independent variable ‘x2’ by unit, by keeping the effect of other independent
variable ‘x1’ constant. For example,
Here, b2 = -3.8, this means, the value of dependent variable (y) is
decreased by 3.8 when the value of independent variable (x 2) is increase by 1, by
keeping the effect of x1 constant.
4 Error term or Residual:

The difference between the observed and estimated value of the dependent variable (y) is
called error or residual and it is denoted by ‘e’
Where
e = Error term
= Observed value of the dependent variable.
= Estimated value of the dependent variable for a given values of set of independent
variable.
5 Measure of variation in the multiple regression

model:
To examine the ability of the set of independent variables to predict the dependent
variable (y) in the multiple regression analysis, several measures of variation need to be
developed. In multiple regression analysis, the total variation or total sum of squares
(SST) is subdivide into explained variation or regression sum of squares (SSR) and
unexplained variation or error sum of squares (SSE).
Mathematically,
Total Sum of Square (SST) = Regression Sum of Square (SSR) + Error Sum of
Square (SSE) i.e.
Where,
21
6 Standard Error of the Estimate for multiple regression

model (Se):
We already discussed the meaning of the standard error of the estimate in the simple
regression model. In multiple regression model, standard error of the estimate is
calculated by using the following relation.
Where ‘k’ is the number of independents variables. The other notations have their
usual meanings.
The interpretation of the standard error of the estimate is already discussed in the
simple regression model.
7 Coefficients of Multiple Determination (r2):

Coefficient of multiple determination measures the proportion of variation in the
dependent variable (y) that is explained by the set of independents variables. The
following relation is used to obtain the coefficient multiple of determination in the
multiple regression analysis.
Adjusted Coefficient of Multiple Determination (r2adj):

The adjusted coefficient of multiple determination is obtained by using the following
relation.
Where
r2 = coefficient of multiple determination.
r2adj = adjusted coefficients of multiple determination.
k= number of independent variables.
n = number of pairs of data.
SST = Total sum of square.
SSE = Sum of square due to error.
SSR = Sum of square due to regression.
22
8 Confidence Interval Estimation of Multiple

Regression coefficient:
In multiple regression analysis, a confidence interval estimate for the population slope (or
multiple regression coefficients) is obtained by using the following relation.
By putting the value of j = 1, 2, 3, 4… k. in the above relation, we can obtain the interval
estimation of 1, 2, 3, 4, …………….,k respectively.
Where
bj = Estimated value of j or Regression coefficients of dependent variable (y) with
independent variable (xj) holding the effect of other independent variables constant.
= Standard error of the regression coefficients bj.
k = number of independent variables.
= Tabulated value of‘t’ obtained from two tailed student’s t-table at (n-k-1)
degree of freedom and ‘’ level of significance.
9 Approximate prediction interval:

This interval is used to obtain within which two values, the actual value of the dependent
variable lies. The following relation is used to obtain the approximate prediction interval.
= Estimated value of the dependent variable for a given values of set of independent
variables.
The other notations have their usual meaning.
10 Test of significance for the regression coefficients:

To determine the existence of a significant linear relationship between the dependent
variable (y) and independent variables, a hypothesis test concerning the population slope
(Regression coefficient) is made by setting the null and alternative hypothesis as stated
below.
Null hypothesis (H0): j = 0 This means there is no linear relationship between

dependent (y) and independent variables (xj)
Alternative hypothesis (H1): j  0 This means there is a significant linear relationship

between dependent (y) and independent variable (xj) (Two tailed)
If null hypothesis is accepted then you can conclude that there is no relationship between
dependent and independent variables. But if alternative hypothesis is accepted then you
can conclude that there is a significant relationship between dependent and independent
variables.
23
Test Statistics:
This test statistics follows t-distribution with (n-k-1) degree of freedom.
Decision: if the calculated value of the test statistics ( tcal) is less than tabulated value
(ttab) then null hypothesis is accepted otherwise alternative hypothesis is accepted i.e.
If tcal < t, n-k-1, then null hypothesis is accepted. Otherwise alternative hypothesis is
accepted.
Where
tα,n-k-1 = tabulated value of ‘t’ at (n-k-1) degree of freedom and ‘α’ level of significance,
obtained from two tailed t-table.
α= level of significance
bj = Regression coefficients of y on xj.
= Standard error of the regression coefficient (bj)
11 Analysis of Variance Table (ANOVA –Table):

ANOVA summary table for testing the significance of a set of regression coefficients in a
multiple regression model with k independent variables.
Source Degree of Sum of Square Mean Square F-statistics
freedom (d.f.)
Regression k SSR
Error n-k-1 SSE
Total n-1 SST
12 F-Test:
This test is used to test the significance of the multiple regression model. By using this F-
test, we can determine whether there is a significant relationship between the dependent
variable and the set of independent variables i.e. we can determine the over all fit of the
regression model.
The null and alternative hypotheses are set as
Null hypothesis (H0): 1 = 2 = 3 = . . . . . . . . . . . . . . = k (This means, no linear
relationship between the dependent variable and the independent variables)
24
Alternative hypothesis (H1): At least one j  0 (This means, linear relationship exist
between the dependent variable and at least one of the independent variable)
Test statistics
It follows F-distribution with k and n-k-1 degree of freedom.
Decision
If Fcal  F (k, n-k-1),α then null hypothesis (H0) is accepted otherwise alternative hypothesis
(H1) is accepted.
Note : To obtain the value of F (k , n-k-1),α see the k degree of freedom along horizontal
and n-k-1 degree of freedom along vertical at α level of significance.
When the degree of freedom cannot be found in the table, the closest value on the smaller
side should be used.
13 Dummy Variables:
A qualitative (categorical) variable which can be divided into two categories by assigning
1 and 0 for the first and second category to separate these categories is called dummy
variables. For example, gender is qualitative variables which can be divided in to male
and female categories by assigning 1 for male and 0 for female or 0 for male and 1 for
female.
In dummy variables, we can not assign the number other than 0 and 1 to separate the two
categories of the dummy variables. We assign the number 0 for the absence and 1 for the
presence of the category.
14 Multiple Regression model including the Dummy

Variable:
Consider the following regression model. Where x2 is the dummy variable, represent the
marital status.
X2 = 0, for unmarried
X2 = 1, for married.
Now the model is
0 = y-intercept.
1 = Regression coefficient of y on x 1 holding the effect of remaining variable x2
constant.
2 = Regression coefficient of dummy variable.
Y = dependent variable.
X1 = independent variable.
25
X2 = independent dummy variable.

The fitted model is
Regression model if x2 = 0
Regression model if x2 = 1
Interpretation of regression coefficients of dummy variable:

The regression coefficient of the dummy variable measures the difference between these
two categories.
If b2 = 3, this means, keeping the effect of x 1 constant, in the presence of the attributed
represented by 1, the value of dependent variable is increased by 3.
If b2 = -3, this means, keeping the effect of x 1 constant, in the presence of the attributed
represented by 1, the value of dependent variable is decreased by 3.
Note: the process of assigning the value 0 and 1 for the two categories is totally arbitrary
but affects the sign of the value of the regression coefficient of dummy variables while
interchanging the 0 and 1 value to denote these two categories. But the value to y-
intercept will be changed on interchanging the 0 and 1 to denote these to categories.
15 Problems on Multiple Regression Analysis:

1. Given the following set of data
a. Calculate the multiple regression equation.
b. Predict Y when X1 = 3.0 and X2 = 2.7
Y X1 X2
25 3.5 5.0
30 6.7 4.2
11 1.5 8.5
22 0.3 1.4
27 4.6 3.6
19 2.0 1.3
2. The following information has been gathered from a random sample of apartment
renters in a city. We are trying to predict rent (in dollars per month) based on the
size of the apartment (number of rooms) and the distance from downtown (in
miles)
Rent ($): 360 1000 450 525 350 300
Number of rooms: 2 6 3 4 2 1
Distance from downtown: 1 1 2 3 10 4
a. Obtain the multiple regression models that best relate these three variables.
b. If someone is looking for a two bedroom apartment 2 miles from downtown, what
rent should he expect to pay?
26

Y: 11.4 16.6 20.5 29.4 7.6 13.8 28.5
X1: 4.5 8.7 12.6 19.7 2.9 6.7 17.4
X2: 13.2 18.7 19.8 25.4 22.8 17.8 14.6
a. Calculate the multiple regression planes.
b. Predict Y when X1 = 10.5 and X2 =13.6
c. Obtain the residual when X1 = 12.6 and X2 =19.8
Y: 6 10 9 14 7 5
X1: 1 3 2 -2 3 6
X2: 3 -1 4 7 2 -4
a. Calculate the multiple regression line.
b. Predict Y when X1 = -1 and X2 =4
c. Compute the residual when X1= 2 and X2 = 4
d. Compute the standard error of the estimate and interpret its value.
e. Compute the coefficient of multiple determinations and interpret its value.
f. Obtain the 95% approximate prediction interval when X1 = 3 and X2 =2.
5. Sam Spade, owner and general manager of the Campus Stationery Store, is
concerned about the sales behavior of a compact cassette tape recorder sold at the
store. He realizes that there are many factors that might help explain sales, but
believes that advertising and price are major determinants. Sam has collected the
following data
Sales (units sold) advertising (no. of ads) price ($)
33 3 125
61 6 115
70 10 140
82 13 130
17 9 145
24 6 140
a. Calculate the least squares equation to predict sales from advertising and price.
b. If advertising is 7 and price is $132, what sales would you predict?
6. The Internal Revenue Department is trying to estimate the monthly amount of

unpaid taxes discovered in the last 6 months by its auditing division with the
support of field – audit (in terms of labour hour and the compute use(hours))
Unpaid tax (‘000’Rs. 10 17 18 26 35 8
Labour hour (field audit) 8 21 14 17 36 9
Computer hour 4 9 11 20 13 28
a. Obtain the line of best fit.
b. If field audit hour is 30 and computer hour is also 30, what will be the expected
unpaid tax discovered.
c. Compute the standard error of the estimate of the unpaid tax and interpret its
value.
d. Obtain the 95% approximate confidence interval when labour hour is 14 and
computer hour is 11.
27
e. Compute the coefficient of multiple determinations. And interpret its value.

f. Compute the error term (residual) when labour is 21 and computer hour is 9.
7. A developer of food for pigs would like to determine what relationship exists
among the age of a pig when it starts receiving a newly developed food
supplement, the initial weight of the pig and the amount of weight it gains in a 1
week period with the food supplement. The following information is the result of a
study of eight piglets.
Piglet number: 1 2 3 4 5 6 7 8
Initial weight (pounds): 39 52 49 46 61 35 25 55
Initial age (weeks): 8 6 7 12 9 6 7 4
Weight gain: 7 6 8 10 9 5 3 4
a. Calculate the least squares equation that best describe these three variables.
b. How much might we expect a pig to gain in a week with the food supplement if it
were 9 weeks old and weighed 48 pounds?
c. Compute the residual for the pig number 5.
8. Mr. Joshi supervisor of the carven Manufacturing Facility is examining the
relationship among the employee’s score on an aptitude test, prior work experience
and success on the job. An employee’s prior work experience is studies and
weighted, yielding a rating between 2 and 12. The measure of on the job success is
based on a point system involving total output and efficiency with a maximum
possible value of 50. Grant sampled six first year employees and obtained the
following information.
Aptitude test score (X1): 75 87 69 93 81 97
Prior experience (X2): 5 11 4 9 7 10
Performance evaluation (Y): 28 33 21 20 38 46
a. Develop the estimating equation that best describe these data.
b. If an employee scored 83 on the aptitude test and has a prior work experience of
7, what performance evaluation would be expected?
9. For this problem use the following multiple regression equation:

= 10+5X1 +3X2 and r2 = 0.60
a. Interpret the meaning of the slopes.
b. Interpret the meaning of the y –intercept.
c. Interpret the meaning of the coefficient of multiple determinations.
d. Predict the estimate value of Y when X1 = 6 and X2 = 10
10. For this problem use the following multiple regression equation:
= 50- 2X1 +7X2 and r2 = 0.40
a. Interpret the meaning of the slopes.
b. Interpret the meaning of the y –intercept.
c. Interpret the meaning of the coefficient of multiple determinations.
d. Predict the estimate value of Y when X1 = 4 and X2 = 8
11. Given the following information from a multiple regression analysis:

n = 25, b1 = 5, b2 = 10, Sb1 = 2, Sb2 = 8
28
a. Set up a 95% confidence interval estimate of the population slope β1.

b. Set up a 95% confidence interval estimate of the population slope β2.
c. At the 0.05 level of significance, determine whether each explanatory variable
makes a significant contribution to the regression model. On the basis of these
results, indicate the independent variable that should be included in this model.
12. Given the following information from a multiple regression analysis:
n = 20, b1 = 4, b2 = 3, Sb1 = 1.2, Sb2 = 0.8
a. Set up a 95% confidence interval estimate of the population slope β1.
b. Set up a 95% confidence interval estimate of the population slope β2.
c. At the 0.05 level of significance, determine whether each explanatory variable
makes a significant contribution to the regression model. On the basis of these
results, indicate the independent variable that should be included in this model.
13. Suppose that X1 is a numerical variable and X2 is a dummy variable and that the
regression equation for a sample of n = 20 is
= 6+4X1 +2X2
a. Interpret the meaning of the slope for variable X1.
b. Interpret the meaning of the slope for variable X2.
c. Suppose that the t-statistic for testing the contribution of variable X 2 is
3.27. At the0.05 level of significance, is there evidence that variable X2
makes a significant contribution to the model?
14. The following ANOVA summary table was obtained form a multiple regression
model with two independent variables
Sources Degree of Sum of Mean F

freedom square (SS) square
(d.f.) (MS)
Regression 2 60
Error 18 120
Total 20 180
a. Determine the mean square that is due to regression and the mean square
due to the error.
b. Determine the computed value of F- statistic.
c. Determine whether there is a significant relationship between Y and the
two explanatory variables at the 0.05 level of significance.
15. The following ANOVA summary table was obtained form a multiple regression
model with two independent variables

freedom square square
(d.f.) (SS) (MS)
Regression 2 30
Error 10 120
Total 12 150
Determine the mean square that is due to regression and the mean square
due to the error.
29
a. Determine the computed value of F- statistic.

b. Test the over all fit of the model at 0.05 level of significance (i.e.
Determine whether there is a significant relationship between Y and the
two explanatory variables at the 0.05 level of significance.)
Output of SPSS:
16. A manager selected a representative sample of 24 monthly customer bills
taken from several recent heating seasons. The manager considers kilowatt
hours per month (Y) as a liner function of square feet heated space (X 1), an
index of roof insulation quality (X 2), PRESENCE/ABSENCE of insulated
windows (X3), mean temperature (X4) and heat pump/electric forced air (X5).
The SPSS output is as following
Unstandarized Std.error T Sig.
Coefficients(β)
Intercept 6356.07 838.701 4.58 0.0000
X1 0.56038 0.15811 3.54 0.0023
X2 -31.2077 8.95905 - 0.0027
3.48
X3 -327.503 149.169
X4 -113.895 16.2604 - 0.0000
7.00
X5 -621.485 147.828 - 0.0005
4.20
ANOVA TABLE
freedom (d.f.) square (SS) square
(MS)
Regression 5
Residual 2166000
Total 23 14370000
a. Fit a multiple regression equation.
b. Test the significance of the estimated regression coefficient of X3 at the
5% significance level.
c. Compute the standard error of the estimate Se or Syx
d. Compute the r2 and interpret its meaning.
e. Given that X1 = 129, X2 = 18, X3 = 5, X4 = 3, X5 = 1129, compute the 95%
approximate prediction interval.
f. Set up the null and alternative hypothesis, carry out F-test and interpret
your result
17. A health research team collects data on ten communities. Measurements are
obtained on the following variables.
Y = health-care facility utilization index
X1= median family income
X2= proportion of workers with health insurance
X3= doctor-population ratio
The ANOVA and coefficients table obtained from SPSS software is as following
30
ANOVA
Sum d.f. Mean F
of square
square
Regression ? 3 ? ?
Residual 88.66 ? ?
Total 476.9 9
Coefficients Table
Unstandardized t
coefficients
bi Std.error(Sbi)
Constant 23.6 8.3 ?
X1 0.62 0.39 ?
X2 16.97 7.86 ?
X3 -0.31 0.33 ?
a. Complete above ANOVA table and Coefficients table.
b. Fit a multiple regression model and predict the value of Y when X 1=15,
X2=22 and X3=25
c. Is there any significant relationship between dependent and three
independent variables? (Test at 5% significance level).
d. Test the significance of the estimated regression coefficients of X 2 at the
5% significance level.
e. What proportion of variation in health –care facility utilization index (Y)
is explained by three independent variables?
f. Compute the standard error of estimate and interpret its meaning.
Out put of different software
18. An economist is interested to see how consumption for an economy (in $billions)
is influenced by gross domestic product (GDP in $ billions) and aggregate price
(consumer price index). The Microsoft Excel output of this regression is partially
reproduced below:
SUMMARY OUTPUT
Regression statistics
Multiple R 0.991
R square 0.982
Adjusted R square 0.976
Standard error 0.299
Observations 10
ANOVA
d.f. SS MS F Sig. F
Regression 2 33.4163 16.7082 186 0.0001
Residual 7 0.6277 0.0897
Total 9 34.0440
31
Coef. Std.error t-stat p-

value
Intercept - 0.5674 -0.1520.8837
0.0861
GDP 0.7654 0.0574 13.340 0.0001
Price - 0.0028 -0.219 0.8330
0.0006
a. Fit the multiple regression model.
b. At the 0.01 level of significance, which of the given independent variables
is the better explanatory variable for consumption of an economy? Are
both significant at 0.01 level of significance?
c. One economy in the sample has an aggregate consumption level of $3
billions, a GDP of $3.5 billion and an aggregate price level of 125. What
is the residual for this data point?
d. Test at the 0.05 level of significance, the overall fit of the model. When
the economist used a simple linear regression model with consumption as
an dependent variable and GDP as the independent variable, he obtained
an R2 value of 0.971. What additional percentage of the total variation of
consumption has been explained by including aggregate prices in the
multiple regression?
19. A professor of statistics was interested to find out the factors affecting student’s
performance on exam. He collected some information related to his hypothesis to
estimate the Grade Point Average (GPA) of students. Latter he ran the computer
package SAS that produced the following outputs.
Root MSE = 11.657308 R square = 0.7672

Variables d.f. Parameters Standard T for H0: Prob |
estixmate error Parameters T|
=0
Intercept 1 -49.947 41.549 -1.202 002684
Hours 1 1.0693 0.9816 1.089 0.3123
IQ 1 1.3645 0.3762 3.627 0.0084
Books 1 2.0398 1.5079 1.353 0.2182
Age 1 -1.7989 0.6733 -2.672 0.0319
a. What is the best fitting regression equation for these data?

b. What percentage of the variation in grades is explained by this equation?
c. What GPA would you expect for a 21-year old student with an IQ of 113,
who studied 5 hours and used 3 different books?
d. With this output which set of variables, the professor can consider to have
a significant effect on GPA at 0.05 level of significance?
20. Use the following Minitab output to determine the best fitting regression equation
for these data:
Predictor Coef. St.dev t- P
ratio
Constant -49.948 41.55 -1.20 0.268
32
X1 1.06931 0.98163 1.09 0.312

X2 1.36460 0.37627 3.63 0.008
X3 2.03982 1.50799 1.35 0.218
X4 -1.7989 0.67332 -2.67 0.319
S = 11.657 R –sq = 76.7%
a. What is the best fitting regression equation for these data?
b. What percentage of the variation in Y is explained by this equation?
c. What value of Y would you expect for a X1 = 5, X2 = 113, X3 = 3, X4 =21
d. Which of the four variables explains the dependent variables significantly?
22. Dr. Ram Sharma is a veterinarian. Recently, he has been trying to develop a
predicting equation for the amount of anesthesia (measured in milliliters) to be used in
operations. He feels that the amount used will depend on the weight of the animal (in
pounds), length of the operation (in hours) and whether the animal is a cat (coded as 0) or
a dog (coded as 1). He used SPSS to run a regression on his data and got these results:
Unstandardized Coefficients t
bi Sbi
Constant 90.032 56.482 ?
Type 99.486 42.374 ?
Weight 21.536 2.668 ?

Hours -34.461 28.607 ?
Anova
Source Sum of Squares Df Mean Square F
Regression ? ? ? ?
Residual 29312 9 ?
Total 620192 12
a. Complete the above table and what is the predicting equation for amounts of
anesthesia?
b. Give an approximate 95% confidence interval for the amount of anesthesia to be
used in 90 minutes operation on a 25 pound dog.
c. Compute coefficient of determination and interpret its meaning.
d. At a significance level of 10%, is the amount of anesthesia needed significantly
different for dogs and cats?
e. At a level of 5%, is this regression significant as a whole? (2009 Spring)
33

Multiple Regression Analysis

Uploaded by

Copyright:

Available Formats

Multiple Regression Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression Analysis

Uploaded by

Copyright:

Available Formats

Multiple Regression Analysis Madhab Bhatta

Multiple Regression Analysis

2 Multiple Regression Model with Two Independent

3 Interpreting the multiple regression coefficients:

independent variable ‘x1’ by unit, by keeping the effect of other independent

4 Error term or Residual:

5 Measure of variation in the multiple regression

6 Standard Error of the Estimate for multiple regression

7 Coefficients of Multiple Determination (r2):

Adjusted Coefficient of Multiple Determination (r2adj):

8 Confidence Interval Estimation of Multiple

9 Approximate prediction interval:

10 Test of significance for the regression coefficients:

Null hypothesis (H0): j = 0 This means there is no linear relationship between

Alternative hypothesis (H1): j  0 This means there is a significant linear relationship

This test statistics follows t-distribution with (n-k-1) degree of freedom.

11 Analysis of Variance Table (ANOVA –Table):

Error n-k-1 SSE

Total n-1 SST

It follows F-distribution with k and n-k-1 degree of freedom.

14 Multiple Regression model including the Dummy

X2 = independent dummy variable.

Interpretation of regression coefficients of dummy variable:

15 Problems on Multiple Regression Analysis:

3. Given the following set of data

6. The Internal Revenue Department is trying to estimate the monthly amount of

e. Compute the coefficient of multiple determinations. And interpret its value.

9. For this problem use the following multiple regression equation:

11. Given the following information from a multiple regression analysis:

a. Set up a 95% confidence interval estimate of the population slope β1.

Sources Degree of Sum of Mean F

Sources Degree of Sum of Mean F

a. Determine the computed value of F- statistic.

Coef. Std.error t-stat p-

Root MSE = 11.657308 R square = 0.7672

a. What is the best fitting regression equation for these data?

X1 1.06931 0.98163 1.09 0.312

Constant 90.032 56.482 ?

Type 99.486 42.374 ?

Weight 21.536 2.668 ?

You might also like