Chapter 5
Chapter 5
Chapter 5
1
5.1 Major Assumptions of Linear Regression Model
2
The fitting of linear regression model,
SLR:
MLR:
Estimation of parameters testing of hypothesis properties of the estimator are based on following major
assumptions:
(a) The relationship between the study variable and explanatory variables is linear, at least approximately.
(b) The error term has zero mean.
(c) The error term has constant variance.
(d) The errors are uncorrelated.
(e) The errors are normally distributed.
The validity of these assumptions is needed for the results to be meaningful. If these assumptions are violated,
the result can be incorrect and may have serious consequences.
5.2 Checking of Linear Relationship Between Study and Explanatory Variables
3
Case of One Explanatory Variable
If there is only one explanatory variable in the model, then it is easy to check the existence of linear
relationship between and by scatter diagram of the available data.
Where - is an observation
- is the corresponding fitted value
Residuals can be thought as the observed values of the model errors. So it can be expected that if there is
any departure from the assumptions on random errors, then it should be shown up by the residual.
Analysis of residual helps is finding the model inadequacies.
The error term has zero mean.
Approximate average variance of residuals is estimated by
6
Residuals are not independent as the residuals have only degrees of freedom. The non-independence
of the residuals has little effect on their use for model adequacy checking as long as is not small relative
to .
Plotting the residuals is an effective way to investigate how well the regression model fit the data or to
check the model assumptions.
5.4 The Hat Matrix and the various type of Residuals
7 Linear regression Model:
Multiple
Solution:
non-singular.
Fitted model:
Where -Hat matrix
8
is symmetric.
H is idempotent.
Since ,
Where is the row of the matrix.measure the distance of ith observation from the center of x-coordinate..
5.5Methods for scaling residuals
9
Sometimes it is easier to work with scaled residuals. We discuss three methods for scaling the residuals.
(a) Standardized Residuals
Studentiecd residuals have constant variance regardless of the location in x-coordinate when the form of the
model is correct.
(c). 10
PRESS residuals:
th
press residual is defined as
Where is the fitted value of ith response based on the all observations except ith one.
Reason: If is really unusual, then the regression model based on all the observations may be overly
influenced by this observations. This could produce a that is very similar to and consequently will be small.
So it will be difficult to detect any outlier.
If is deleted, then cannot be influenced by that observation, so the resulting residual should be likely to
indicate the presence of the outlier.
11
Procedure
Delete the ith observation,
Fit the regression model to remaining observations,
Calculate the predicted value of corresponding to the deleted observation.
The corresponding prediction error
Calculate for each
These prediction errors are called PRESS residuals because they are used in computing the prediction error
sum of squares. They are also called as deleted residuals.
It is possible to calculate PRESS residuals from the result of the one single fit
5.6 Residual plots
12
The graphical analysis of residuals is a very effective way to investigate the adequacy of the fit of a
regression model and to check the underlying assumptions.
5.6.1 Normal probability plot
The normal probability plot is a plot of the ordered standardized residuals versus the so called normal scores.
The normal scores are the cumulative probability
Let be residuals.
Let be the residuals ranked in increasing order.
Let
Plot ageist and the plot is called normal probability plot.
a. The figure 1 has an ideal normal probability plot. Points lie approximately on the straight line and indicate that the
underlying
13 distribution is normal.
b. The figure 2 has sharp upward and downward curves at both extremes. This indicates that the underlying distribution
is heavy tailed, i.e., the tails of underlying distribution are thicker than the tails of normal distribution.
c. The figure 3 has flattening at the extremes for the curves. This indicates that the underlying distribution is light tailed,
i.e., the tails of the underlying distribution are thinner than the tails of normal distribution.
d. The figure 4 has sharp change in the direction of trend in upward direction from the mid. This indicates that the
underlying distribution is positively skewed.
e. The figure 5 has sharp change in the direction of trend in downward direction from the mid. This indicates that the
underlying distribution is negatively skewed.
14
5.6.2 Plots of Residuals against the Fitted Value
15
A plot of residuals or any of the scaled residuals versus the corresponding fitted values is helpful in
detecting several common type of model inadequacies. Following types of plots of versus have particular
interpretations:
(a) If plot is such that the residuals can be contained is a horizontal band fashion (and residual fluctuates is
more or less in a random fashion inside the band) then there are no obvious model defects.
(b). It plot is such that the residuals can be contained is an outward opening funnel then such pattern indicates that the
16 of errors is not constant but it is an increasing function of y.
variance
(c). If plots is such that the residuals can be accommodated in an inward opening funnel, then such pattern indicates that
the variance of errors is not constant but it is a decreasing function of y.
(d). If plot is such that the residuals can be accommodated inside a double bow, then such pattern
17
indicates that the variance of errors is not constant but y is a proportion between 0 and 1. The y then
may have a Binomial distribution. The variance of a Binomial proportion near 0.5 is greater as
compared to near zero or 1. So the assumed relationship between y and X s' is nonlinear.
(e). If plot is such that the residuals are contained inside a curved plot, then it indicates nonlinearity.
The assumed relationship between y and X s' is non linear. This could also mean that some other
explanatory variables are needed in the model. For example, a squared error term may be necessary.
Transformations on explanatory variables and/or study variable may also be helpful is these cases.
5.6.3 Plots of Residuals in Time Sequence
If18
the time sequence is which the data were collected is known, then the residuals can be plotted against
the time order. We proceed as follows:
• Consider the residuals on Y -axis and time order on X − axis.
• Interpretation of the plots is same as in the case of plots of residuals versus.
5.6.5 Partial Residual Plots
19 residual plot consider the marginal role of the regression
Partial given other regressors that are already in
the model. In this plot the response variable and the regressor are both regressed against the other
regressors in the model and the residuals are obtained for each regression.
The plot of these residuals against each other show the marginal role of regressor on response variable Y in
the presence of other regressors in the model.
We are illustrated in marginal role of on response variable Y in the presence of in the model.
Regress Y on :
Regress on :
20