Chapter - Two - Simple Linear Regression - Final Edited

Simple Regression Model
Chapter 2: The Simple Regression Model
Regression analysis is concerned with the study of the dependence of one variable (the
dependent variable) on one or more other variables (the explanatory variables) with a view to
estimating and/or predicting the (population) mean or average value of the former in terms of the
known or fixed (in repeated sampling) values of the latter. For example: an economist may be
interested in studying the dependence of personal consumption expenditure on after tax or
disposable real personal income. Such an analysis may be helpful in estimating the marginal
propensity to consume (MPC), that is, average change in consumption expenditure for, say, a
dollar’s worth of change in real income. So, regression analysis helps to estimate or predict the
average value of one variable on the basis of the fixed values of other variables. On the other
word, when a single variable is used to estimate the value of an unknown variable, the method is
referred to as simple regression analysis. In summary, the key idea behind regression analysis is
the statistical dependence of one variable, the dependent variable, on one or more other
variables, the explanatory variables. The objective of such analysis is to estimate and/or predict
the mean or average value of the dependent variable on the basis of the known or fixed values of
the explanatory variables.
Terminology
1
If we are studying the dependence of a variable on only a single explanatory variable, such as
that of consumption expenditure on real income, such a study is known as simple or two-variable
regression analysis. However, if we are studying the dependence of one variable on more than
one explanatory variable, as in the crop-yield; rainfall, temperature, soil fertility, and fertilizer
examples, it is known as multiple regression analysis. In other words, in two-variable regression
there is only one explanatory variable, whereas in multiple regressions there is more than one
explanatory variable. Thus, the estimator of these regressions is Ordinary Least Square (OLS)
estimator.
Simple linear regression
Simple linear regression or two variable regression analyses is not practically adequate, but it
presents the fundamental ideas of regression analysis as simply as possible and some of these
ideas can be illustrated with the aid of two-dimensional graphs. On the other hand, the more
general multiple regression analysis in which the regressand is related to one or more regressors
is in many ways a logical extension of the two-variable case.
Yi = β0 + β1X1 + ui (1)
Where, Y = dependent or regressand or endogenous or response or predicted or explained
variable; β0 = constant or intercept term; β 1 = slope coefficient that explains how Y changes for a
one unit increase in X; X = independent or regressor or exogenous or predictor or explanatory
control or covariate variable; U= error or disturbance term (It contains all factors other than X
that affect Y). Simple regression analysis is called simple or two variables regression model; it
shows how Y varies with X. Regression able to address the question “how do we estimate β 0 and
β1 by minimizing the total sum of residuals (ui)?”
Example: Suppose the relationship between expenditure (Y) and income (X) of households is
expressed as: Y = 0.6X + 120. Here, on the basis of income, we can predict expenditure. For
instance, if the income of a certain household is 1500 Birr, then the estimated expenditure will
be: expenditure = 0.6(1500) + 120 = 1020 Birr. Note that since expenditure is estimated on the
basis of income, expenditure is the dependent variable and income is the independent variable.
The error term
Consider the above model: Y = 0.6X + 120. This functional relationship is deterministic or exact,
that is, given income we can determine the exact expenditure of a household. But in reality this
rarely happens: different households with the same income are not expected to spend equal
2
amounts due to habit persistence, geographical and time variation, etc. Thus, we should express
the regression model as: Yi = α + βX i + ei where, ei is the random error term (also called
disturbance term). Generally the reasons for including the error term include:
Omitted variables: a model is a simplification of reality. It is not always possible to include
all relevant variables in a functional form. For instance, we may construct a model relating
demand and price of a commodity. But demand is influenced not only by own price: income
of consumers, price of substitutes and several other variables also influence it. The omission
of these variables from the model introduces an error.
Measurement error: inaccuracy in collection and measurement of sample data.
Sampling error: Consider a model relating consumption (Y) with income (X) of
households. The sample we randomly choose to examine the relationship may turn out to be
predominantly poor households, predominantly women or men, households who live in the
remote areas will not consider in the sample. In such cases, our estimation of α_ and β_
from this sample may not be as good as that from a balanced sample group.
Note that the size of the error (e i) is not fixed: it is non-deterministic or stochastic or
probabilistic in nature. This in turn implies that Yi is also probabilistic in nature. Thus, the
probability distribution of Yi and its characteristics are determined by the values of Xi and by
the probability distribution of ei. Thus, a full specification of a regression model should include
a specification of the probability distribution of the disturbance (error) term. This information is
given by what we call basic assumptions or assumptions of the classical linear regression model
(CLRM).
Consider the model:
Yi = α + βXi + ei , i = 1, 2, . . ., n
Here, the subscript i refer to the i th observation. In the CLRM, Yi and Xi are observable while ei
is unobservable. If i refer to some point or period of time, then we speak of time series data or
panel data. On the other hand, if i refer to the i th individual, object, geographical region, etc., then
we speak that it is cross-sectional data.
Assumptions of the classical linear regression model
If our objective is to estimate β1 and β2 only, the method of OLS discussed in the preceding
section will suffice. But in regression analysis, our objective is not only to obtain the estimated
3
values of β1 and β2 but also to draw inferences about the true β1 and β2. For example, we would
like to know how close β1 and β2 are to their counterparts in the population or how close is
to the true E (Y|Xi). Notice that E (Y|Xi) is the same as . To that end, we must not only
specify the functional form of the model, but also make certain assumptions about the manner in
which Yi are generated. To see why this requirement is needed, look at the 1PRF: Yi = β1 + β2Xi
+ ui. It shows that Yi depends on both Xi and ui. Therefore, unless we are specific about how Xi
and ui are created or generated, there is no way we can make any statistical inference about the
Yi and also, as we shall see, about β1 and β2. Thus, the assumptions made about the Xi variable
(s) and the error terms are extremely critical to the valid interpretation of the regression
estimates.
The Classical Linear Regression Model (CLRM), which is the cornerstone of most econometric
theory; makes 10 assumptions. We first discuss these assumptions in the context of the two
variable regression models; and we extend them to multiple regression models, that is, models in
which there is more than one regressor.
Assumption 1: Linear in parameter: The regression model is linear in the parameters. Linear-
in-parameter means no parameters appear as an exponent or is multiplied or divided by another
parameter whatever the relationship between Y and X is linear or non-linear. Since linear-in-
parameter regression models are the starting point of the CLRM; we will maintain this
assumption throughout this teaching material. Keep in mind that the regressand Y and the regress
or X themselves may be nonlinear; Yi = β1 + β2Xi + ui
Assumption 2: X values are fixed in repeated sampling. Values taken by the regressor X are
considered fixed in repeated samples. More technically, X is assumed to be non-stochastic. What
this means is that our regression analysis is conditional regression analysis, that is, conditional on
the given values of the regressor(s) X. In experiments, where researchers have a chance to
control the explanatory variables / regressors, the explanatory variables are non-stochastic. For
instance, to see the response of fertilizer level on maize yield; the researcher may specify
regressors value at 0 kg/ha, 25 kg/ha, 50kg/ha, and 100 kg/ha purposively. Here the variable,
fertilizer level, is non-stochastic.
Assumption 3: Zero mean value of disturbance ui. Given the value of X, the mean, or expected
value of the random disturbance term ui is zero. Technically, the conditional mean value of ui is
1
PRF stands for Population Regression Function
4
zero. Symbolically, we have E(ui|Xi) =0. It states that the mean value of ui, conditional upon the
given Xi, is zero. Geometrically, this assumption can be pictured as in Figure 1, which shows a
few values of the variable X and the Y populations associated with each of them. As shown, each
Y population corresponding to a given X is distributed around its mean value (shown by the
circled points on the PRF) with some Y values above the mean and some below it. The distances
above and below the mean values are nothing but the ui and what this assumption requires is that
the average or mean value of these deviations corresponding to any given X should be zero.
This assumption says that the factors not explicitly included in the model, and therefore
subsumed in ui, do not systematically affect the mean value of Y; so to speak, the positive ui
values cancel out the negative ui values so that their average or mean effect on Y is zero. Note
that the assumption E(ui|Xi) = 0 implies that E(Yi|Xi) = β1 + β2Xi Therefore, the two
assumptions are equivalent.
Figure1: Conditional Distributions of the Disturbance ui (Gujarati, 2004)

Assumption 4: Homoskedasticity or equal variance of ui. Given the value of X, the variance of
ui is the same for all observations. That is, the conditional variances of ui are identical.
Symbolically, we have var(ui|Xi) = E[ui – E(ui|Xi)] 2 = E(ui2|Xi) = σ2 because of Assumption 3.
This equation states that the variance of ui for each Xi (i.e., the conditional variance of ui) is
5
some positive constant number equal to σ2. Technically, it represents the assumption of
Homoskedasticity, or equal (homo) spread (scedasticity) or equal variance. The word comes
from the Greek verb skedanime, which means to disperse or scatter. State differently, it means
that the Y populations corresponding to various X values have the same variance. Put simply, the
variation around the regression line (which is the line of average relationship between Y and X is
the same across the X values; it neither increases nor decreases as X varies. Diagrammatically,
the situation is as depicted in Figure 2.
Figure 2: Homoskedasticity
In contrast, consider Figure 3, where the conditional variance of the Y population varies with X.
This situation is known appropriately as heteroskedasticity, or unequal spread, or variance.
Symbolically, it can be written as var(ui|Xi) = σ2.
6
Figure3: heteroskedasticity
Notice the subscript on σ2 in var(ui|Xi)  σ2 which indicates that the variance of the Y population
is no longer constant.
To make the difference between the two situations clear; let Y represent weekly consumption
expenditure and X weekly income. Figures 2 and 3 show that as income increases the average
consumption expenditure also increases. But, in Figures 2 the variance of consumption
expenditure also remains the same at all levels of income, whereas in Figures 3 it increases with
increase in income. In other words, richer families on the average consume more than poorer
families, but there is also more variability in the consumption expenditure of the former.
To understand the rationale behind this assumption, refer to Figures 3. As this figure shows that
var(u|X1) < var(u|X2) < ,..., var(u|Xi). Therefore, the likelihood is that the Y observations
coming from the population with X = X1 would be closer to the PRF than those coming from
populations corresponding to X =X2, X =X3 and so on. In short, not all Y values corresponding
to the various X’s will be equally reliable, reliability being judged by how closely or distantly the
Y values are distributed around their means, that is, the points on the PRF. If this is in fact the
case, would we not prefer to sample from those Y populations that are closer to their mean that
are widely spread? But doing so might restrict the variation we obtain across X values.
By invoking Assumption 4, we are saying that at this stage all Y values corresponding to the
various X’s are equally important. In unit six we shall see what happens if this is not the case,
7
that is, where there is heteroskedasticity. Note that Assumption 4 implies that the conditional
variances of Yi are also homoskedastic. That is, var(Yi|Xi) = σ 2. Of course, the unconditional
variance of Y is σ2.
Assumption 5: No autocorrelation between the disturbances. Given any two X values. Xi and Xj
(i j), the correlation between any two ui and uj (ij) is zero. Symbolically;
Where, i and j are two different observations, cov() means covariance.

In words, it postulates that the disturbances ui and uj are uncorrelated. Technically, this is the
assumption of no serial correlation, or no autocorrelation. This means that, given Xi, the
deviations of any two Y values from their mean value do not exhibit patterns such as those
shown in Figure 4 (a) and 4 (b). In Figure 4 (a), we see that the u's are positively correlated, a
positive u followed by a positive u or a negative u followed by a negative u. Figure 4 (b) the u's
are negatively correlated, a positive u followed by a negative u and vice versa. If the disturbances
(deviations) follows systematic patterns [Figure 4 (a) and Figure 4 (b)], there is auto or serial
correlation, and what Assumption 5 requires is that such correlations be absent. Figure 4 (c)
shows that there is no systematic pattern to the u's, thus indicating zero correlation.
8
Figure 4: Patterns of correlation among the disturbances. (a) positive serial correlation; (b)
negative serial correlation; (c) zero correlation.
One can explain this assumption as follows. Suppose in our PRF (Yt = β1+β2Xt + ui) that u t and
ut-1 are positively correlated. Then Yt depends not only on Xt but also on u t-1 for ut-1 to some
extent determines ut. By invoking Assumption 5, we are saying that we will consider the
systematic effect, if any, of Xt on Yt and not worry about the other influences that might act on
Y as a result of the possible inter correlations among the u's.
9
Assumption 6: Zero covariance between ui and Xi or E(ui|Xj) = 0. Finally,
Assumption 6 states that the disturbance ui and explanatory variable Xi are uncorrelated. The
rational for this assumption is as follows: When we expressed the PRF, we assumed that X and u
(which may represent the influence of all the omitted variables) have separate (and additive)
influence on Y. But if X and u are correlated, it is not possible to assess their individual effects
on Y. Thus, if X and u are positively correlated, X increases when u increases and it decreases
when u decreases. In either case, it is difficult to isolate the influence of X and u on Y.
Assumption 6 is automatically fulfilled if X variable is non-random or non-stochastic and
Assumption 3 holds, for in that case, cov(ui, Xi) = E[Xi – E(Xi)] [ui – E(ui)] = 0; since X’s are
non-stochastic. But since we have assumed that our X variable is not only non-stochastic but also
assumes fixed values in repeated sample, Assumption 6 is not very critical for us; it is stated here
merely to point out that the regression theory presented in the model holds true even if the X’s
are stochastic or random, provided they are independent or at least uncorrelated with the
disturbances ui.
Assumption 7: The number of observations n must be greater than the number of parameters to
be estimated. Alternatively, the number of observation n must be greater than the number of
explanatory variables.
Assumption 8: Variability in X values. The X values in a given sample must not all be the same.
Technically, var(X) must be a finite positive number.
Assumption 9: The regression model is correctly specified. Alternatively, there is no
specification bias or error in the model used in empirical analysis.
The classical econometric methodology assumes implicitly, if not explicitly, this assumption can
be explained informally as follows. An econometric investigation begins with the specification
10
of the econometric model underlying the phenomenon of interest. Some important questions that
arise in the specification of the model include the following three features. (1) What variables
should be included in the model? (2) What is the functional form of the model? Is it linear in the
parameters, the variables, or both? (3) What are the probabilistic assumptions made about the Yi,
the Xi, and the ui entering the model? These are extremely important questions, for example, by
omitting important variables from the model, or by choosing the wrong functional form, or by
making wrong stochastic assumptions about the variables of the model, the validity of
interpreting the estimated regression will be highly questionable.
Our discussion of the assumptions underlying the classical linear regression model is now
completed. It is important to note that all these assumptions pertain to the PRF only and not the
2
SRF. But it is interesting to observe that the method of least squares discussed previously has
some properties that are similar to the assumptions we have made about the PRF. For example,
the finding that  0 is similar to the assumption that E(ui|Xi) = 0. Likewise, the finding
that is similar to the assumption that cov(ui, Xi) = 0. It is comforting to note that the
method of least squares thus tries to “duplicate” some of the assumptions we have imposed on
the PRF. Of course, the SRF does not duplicate all the assumptions of the CLRM. As we will
show later, although cov(ui, uj) = 0 where, i j by assumption, it is not true that the sample
where, i j. As a matter of fact, we will show later that the residuals not only
are autocorrelated but also are Heterosckedastic. When we go beyond the two-variable model
and consider multiple regression models, that is, models containing several regressors, we shall
add the following assumption.
Assumption 10: There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.
Method of estimation
Specifying the model and stating its underlying assumptions are the first stage of any
econometric application. The next step is estimation of the numerical values of the parameters of
2
SRF indicates the Sample Regression Function
11
economic relationships. The parameters of the simple linear regression model can be estimated
by various methods. The most commonly used methods are;
1. The least square method (OLS)
2. The maximum likelihood method (MLM)
3. The method of moments (MM)
4. Bayesian estimation technique.
5. The free hand method
6. The semi-average method
But, here we deal with the OLS method of estimation.
In the regression model Yi = β1 + β2Xi + ui the values of the parameters β1 and β2 are not
known. When they are estimated from a sample of size n, we obtain the sample regression line
given by: . Where β1 and β2 are estimated by and , respectively, and is

the estimated value of Y. The dominating and powerful estimation methods of the parameters or
regression coefficients (β1 and β2) are the method of least squares.
Consider the two -variable Population Regression Function (PRF):
----------------------------- (1)
Where, ui is the disturbance term. The disturbance term ui is a proxy for all those variables that
are omitted from the model but that collectively affect Y. However, the PRF is not directly
observable. So, we estimate it from the Sample Regression Function (SRF):
------------------------- (2)
----------------------------- (3)
But how is the SRF itself determined? To see this, let us proceed as follows. First, express
equation (3) as:
------------------------------- (4)
12
Equation (4) shows that the estimated residuals are simply the differences between the actual and
estimated Y values. Now given n pairs of observations on Y and X, we would like to determine
the SRF in such a manner that it is as close as possible to the actual Y. To this end, we may adopt
the following criterion: Choose the SRF in such a way that the sum of the residuals
is as small as possible. Although intuitively appealing, this is not a very good

criterion, as we can be seen in the hypothetical scatter gram shown in Figure 5.
If we adopt the criterion of minimizing , Figure 5 shows that the residuals as well
as the residuals receive the same weight in the sum , although the first
two residuals are much closer to the SRF than the latter two. In other words, all the residuals
receive equal importance no matter how choose or how widely the individual observations
scattered is from the SRF.
Figure 5: Least-squares criterion
A consequence of this is that it is quite possible that the algebraic sum of the is small (even
zero) but the value of are widely scattered about the SRF. To see this, let in
Figure 5 assume the value of 10, -2, +2, and -10, respectively. The algebraic sum of these
residuals is zero although are scattered more widely around the SRF that .
13
We can avoid this problem if we adopt the least squares criterion, which states that the SRF can
be fixed in such a way that the sum of squares of the errors (SSE) is:
------------------- (5)
The value of squared residuals is as small as possible, where, are the squared residuals.
Why we need Residual Square? It helps to us to give more emphasis on the outliers in order
to make correlation. When we square the value of predicted disturbance value, we can observe
the outlier who has more value than small value. So then, once we observe the outlier, we can
draw remedial measures on it. That means, by squaring , this method gives more weight to
residuals such as in Figure 5 than the residuals . Our aim is then to determine
the equation of such an estimating line in such a way that the error in estimation is minimized.
14
About (6) is the first normal equation and (7) is the second normal equation. Solving the normal
equation simultaneously or using matrix algebra, we obtain β1 and β2. Before, we solving β1 and
β2, we should know where equation (6) and (7) came from.
Solving the normal equation simultaneously or using matrix algebra, we obtain β1 and β2. We
did the proof for the following two equations on the white board in the class.
----------------------------------------------------------------------------- (8)
-------------- (9)
15
Example 1: suppose the following data on the level of Teff yield (Yi) measured in Qt/ha and the
amount of labor hours to work (Xi). Calculate the value of β1 and β2. Interpret the results what
you get.
N 1 2 3 4 5 6 7 8 9 10
Yi 11 10 12 6 10 7 9 10 11 10
Xi 10 7 10 5 8 8 6 7 9 10
We obtained that β1 = 3.6 and β2 = 0.75. So, the regression function between Teff yield and
labor hours could be: ==>> Y = 3.6 + 0.75X

Interpretation:
 When the value of labor hours becomes zero, the quantity of Teff yield becomes 3.6 Qt/ha.
 The constant term or Y-intercept (β1) might be outside of the observed data (labor hours).
That means constant term (β1) is the garbage collector for the regression model, i.e, β1
could be the part estimated by the omission of predictors/explanatory variables.
 When the amount of labor hours is increased by one hour, then quantity of Teff yield would be
increased by 0.75Qt/ha if the other variables remain constant.
16
Example 2: based on the following hypothetical data on weekly family consumption expenditure
(Y) and weekly income (X). i) Calculate the constant term and marginal propensity to
consumption; ii) interpret the result of the calculated coefficients.
Interpretation:
 ^
B} rsub {2}} ¿ ¿ The family consumption expenditure will increase by 0.51 units if the
family weekly income increases by one unit. However,
 ^
β} rsub {1} ¿¿ ” ** the family consumption expenditure will be 24.45 units if the family
weekly income becomes zero; that means family consumption expenditure will not
depend on their weekly income, instead there will be another sources or funds to cover
their weekly consumption expenditure whether it is remittance or pension, so forth.
Therefore, the regression function between weekly family consumption expenditure and
weekly income would be ==> Yi = 24.453 + 0.5091X
Gauss-Markov Theorem
The ideal or optimum properties that the OLS estimate posses may be summarized by well
known theorem known as the Gauss –Markov Theorem. According to this theorem, under the
basic assumption of CLRM, the least square estimator are linear, unbiased, and have minimum
variance. On other word, Gauss–Markov Theorem states that “Given the assumptions of the
17
classical linear regression model, the least-squares estimators, in the class of unbiased linear
estimators, have minimum variance, that is, they are Best Linear Unbiased Estimator (BLUE).”
These are;
Recall those equations:- Yi = β0 + β1Xi +ui where, i = 1, 2 … N  PRF (i)
 SRF (ii)
As general, based on Gauss-Markov theorem, the most common statistical properties of the
OLS estimator are:
 linearity,
 unbiasedness and minimum variance.
 That means; the OLS estimator (β1_hat) is said to be best linear unbiased estimator
(BLUE) of β1_hat if those three features should hold.
If an estimator is unbiased and has minimum variance, then it is efficient. But, if one or more of
those assumptions fail, OLs estimators are no more Best rather there are better, but not best,
estimators.
Equation (8) and (9) shows that the least-squares estimate are a function of the sample data. But
since the data are likely to change from sample to sample, the estimates will change ipso facto.
18
Therefore, what is needed is some measure of “reliability” or precision of the estimators
. In statistics the precision of an estimate is measured by its standard error (3se).

1) Linearity: - The OLS coefficient estimator (beta hat) can be written as a linear function of
the sample values of Y, the Yi (i = 1... N).
3
The standard error is nothing but the standard deviation of the sampling distribution of the estimator,
and the sampling distribution of an estimator is simply a probability or frequency distribution of the
estimator, that is, a distribution of the set of values of the estimator obtained from all possible samples of
the same size from a given population. Sampling distributions are used to draw inferences about the
values of the population parameters on the basis of the values of the estimators calculated from one or
more samples (Gujarati, 2004, page 81/1003).
19
Confidence Intervals and Hypothesis Testing

Estimation results of the sample regression function are not sufficient alone unless they are used
it draw inferences about the population regression function (PRF). This requires finding out how
close β𝑖_hat ′𝑠 are to the true βi′𝑠 or how close 𝜎̂ 2 is to the true 𝜎2. Since β𝑖_hat ′𝑠 and 𝜎̂ 2 are
random variables, we need to find out their probability distributions. Under equation (14) and
equation of “β1_hat = β1 + Σkiui” (on page 22 of this reading material), we express the estimators
in terms of constant values and the random term, . This implies that the estimators probability
distribution of the estimators will follow the probability distribution of . Since 𝑢𝑖 is assumed to
be normally distributed based on the central limit theorem (CLT) of statistics, estimators will
also be normally distributed ipso facto. Before we proceed to hypothesis testing, we need first to
estimate the variance of the random variable 𝑢𝑖 and then will continue to hypothesis testing using
different approaches.
Variance of the random variable
The variance of the disturbance (error) term for the population equals to 𝜎2. But, unless we
obtained data about the population, we can’t determine the population variance (𝜎2) directly.
Hence, 𝜎2 is estimated from the sample using the estimator:
𝜎̂𝑢2 =Σu𝑖2/(𝑛−2) ----------------------------- (17)
How can we derive the equation? We should start from the PRF
Population Regression Function: 𝑌𝑖 = β0 + β1𝑋𝑖 + 𝑢𝑖 ……………………….. (a)
Therefore, 𝑌 ̅ = β0 + β1𝑋 ̅ + u ̅
Take the deviation: 𝑌𝑖 −𝑌 ̅ = β0 + β1𝑋𝑖 + 𝑢𝑖 − (β0 + β1𝑋 ̅ + u ̅)
yi = β0− β0 + β1(X𝑖 − 𝑋 ̅)+ (𝑢𝑖 − u ̅) where, yi = (𝑌𝑖 −𝑌 ̅)
yi = β1xi+ (𝑢𝑖 − u ̅) where, xi = (X𝑖 − 𝑋 ̅) ------------------------(18)
Sample regression function: Y i= β^0 + βÎ X I +ui ------------------ (b), or
Estimated sample regression function:YÎ = ^
β0+ ^
β I X I -------------- (𝑐)
Therefore, from (b), we get:
Yî= β^0 + βÎ X
Take the deviation: YÎ −Yî= β^0 + βÎ X I −¿
y i= βÎ (X ¿¿ I −X ) ¿
^
y i= βî xi -------------- (19)

^
20
Substituting 𝑌i ̂ from equation (c) in place of “ ^

β0+ ^
β I X I ” in equation (b), we get:
Y i=Yî+u i -------------- (20)
Then, taking the summation of equation (20)
Σ𝑌𝑖 = ΣYî + Σu𝑖  Σ𝑌𝑖 = ΣYî 𝑠𝑖𝑛𝑐𝑒 Σu𝑖 = 0
Taking the mean of both sides: (Σ𝑌𝑖) = (ΣYî )  Y = Yî
Since Y =Yî, subtract Y from the left hand side, and Yî from the right hand side of equation (20):
Y i−Y =Yî +ui−Yî Substitute 𝑦𝑖 from equation (18) and ^

y i from equation (19):
𝑦𝑖 = 𝑦̂𝑖 + ui from this, we can derive the following.

u𝑖 = 𝑦𝑖 − 𝑦̂𝑖 -------------- (21)
Substituting equation (18) and equation (19) in equation (21) in place of 𝑦𝑖 and 𝑦̂𝑖 respectively:
u𝑖 = β1xi+ (𝑢𝑖 − u ̅) – ^
βi xi
u𝑖 = (𝑢𝑖 − 𝑢̅) – ( ^
β 1−β 1 ¿ x i -------------- (22)
Taking the summation of the square of both sides of equation (22)
Σu𝑖2 = Σ[(𝑢𝑖 − 𝑢̅) − ( ^
β 1−β 1)𝑥𝑖]2
= Σ[(𝑢𝑖 − 𝑢̅)2 − 2(𝑢𝑖 − 𝑢̅)( ^

β 1−β 1)𝑥𝑖 + ( ^
β 1−β 1)2𝑥𝑖2]
= Σ(𝑢𝑖− 𝑢̅)2 − 2( ^
β 1−β 1) Σ𝑥𝑖(𝑢𝑖 − 𝑢̅) + ( ^
β 1−β 1)2 Σ𝑥𝑖2
Taking the expected value:
E(Σu𝑖2) = E[Σ(𝑢𝑖− 𝑢̅)2 − 2( ^
β 1−β 1) Σ𝑥𝑖(𝑢𝑖 − 𝑢̅) + ( ^
β 1−β 1)2 Σ𝑥𝑖2]
E(Σu𝑖2) = E[Σ(𝑢𝑖− 𝑢̅)2]− E[( ^

β 1−β 1)2 Σ𝑥𝑖2] + 2E[( ^
β 1−β 1) Σ𝑥𝑖(𝑢𝑖 − 𝑢̅)] ------------ (23)
Let’s say E[Σ(𝑢𝑖− 𝑢̅)2] = A , E[( ^

β 1−β 1)2 Σ𝑥𝑖2] = B , and 2E[( ^
β 1−β 1) Σ𝑥𝑖(𝑢𝑖 − 𝑢̅)] = C
Then, we have to deal each function independently.
A. E[Σ(𝑢𝑖− 𝑢̅)2] = E(Σ(𝑢𝑖2 −2ui𝑢̅ + 𝑢̅2))
= E(Σ𝑢𝑖2 −2𝑢̅ Σ𝑢𝑖 +Σ𝑢̅2)
= E(Σ𝑢𝑖2 −2𝑢̅ Σ𝑢𝑖 +Σ𝑢̅ 𝑢̅)
= E(Σ𝑢𝑖2 −2𝑢̅ Σ𝑢𝑖 + 𝑢̅ Σ𝑢𝑖) since Σ𝑢𝑖 =Σ𝑢̅
= E(Σ𝑢𝑖2 −𝑢̅ Σ𝑢𝑖)
= E(Σ𝑢𝑖2 − 𝑢̅ (n𝑢̅)) since Σ𝑢𝑖 = Σ𝑢̅ and Σ𝑢̅ = n𝑢̅
= E(Σ𝑢𝑖2 − n(𝑢̅)2)
21
= E[Σ𝑢 𝑖
2
–n(
∑ 2
ui
)]2
n
= E[Σ𝑢 – ∑ i ]
2
( u)
2
𝑖
n
1
E (∑ ui ) ]
2
= Σ(E(𝑢𝑖2)–
n
1
E (∑ ui +2 Σ ui u j )
2
= nσu2 –
n
1
= nσu2 – ¿ since 𝑖 ≠ 𝑗
n
1 2 1
= nσu2 – n σ + ∑ E(u i u j ¿)¿ 𝑏𝑢𝑡, (𝑢𝑖𝑢𝑗) = 0 by assumption of CLRM
n u n
1 2
= nσu2 – nσ
n u
n
= σ2 (n– ¿
n
= σ2 (n – 1 ¿ --------------------------------------------------------- (24)
B. E[( ^
β 1−β 1)2 Σ𝑥𝑖2] = Σ𝑥𝑖2 E( ^
β 1−β 1)2
2
σ
Also, we know from equation (12) that: ( ^
β 1) = ( ^
β 1−β 1)2 =
∑ xi2
Substitute this in place of E( ^
β 1−β 1)2
2
σ
Thus, Σ𝑥 E( ^
2
β 1−β 1)2 = Σ𝑥𝑖2 *
𝑖
∑ xi2
Σ𝑥𝑖2 E( ^
β 1−β 1)2 =σ 2 --------------------------------------------------------- (25)
C. 2E[( ^
β 1−β 1) Σ𝑥𝑖(𝑢𝑖 − 𝑢̅)] = -2E[( ^
β 1−β 1) (Σ𝑥𝑖𝑢𝑖 − Σ𝑥𝑖𝑢̅)]
= -2E[( ^
β 1−β 1)(Σ𝑥𝑖𝑢𝑖)] since Σ𝑥𝑖 = 0
Also, recall the equation of ^

β 1= β 1+ ¿Σk𝑖𝑢𝑖 From this, we can write as like
^
(β ¿ ¿ 1 ¿−β 1)¿ ¿ = Σk𝑖𝑢𝑖 so, which gives
= -2E[(Σk𝑖𝑢𝑖)(Σ𝑥𝑖𝑢𝑖)] but, ki = xi2/Σxi2
= -2E[¿ ¿]
= -2E[¿ ¿]
= -2[¿ ¿]
22
= -2[¿ ¿]
= -2(𝑢𝑖2) = −2𝜎2 ------------------------------------ (26)
Substitute equations (24) to (26) in place of A, B, and C in equation (23) respectively, we get:
(Σu𝑖2) = 𝜎2(𝑛 − 1) + 𝜎2 − 2𝜎2
= (𝑛 − 2)2
𝜎 2
=E( ∑ 2
ui
)= E(σ^ 2)
n−2
σ^ 2=( )∑ u2i
n−2
------------------------------------ (26)
(∑ )
2
ui
Hence, σ^ 2= is unbiased estimator of the true variance of the error term 𝜎2
n−2
Hypothesis testing
Now, we have all the estimations require for making hypothesis testing. It is obvious that the
estimates of parameters are obtained from samples. The problem, however, is that estimates
made using samples are prone to errors. As a result, we need to test significance of estimates and
determine the degree of confidence to measure the validity of these estimates. What are the
concepts of null hypothesis and alternative hypothesis? Generally, Null hypothesis depends on
population estimator or true value of β 1. Hence, null hypothesis assumes that there is no any
statistical significance effect of the independent variable X on the dependent variable Y.
Therefore, H0: β1 = 0. However, to verify the null hypothesis is whether it is true or false, we
should use alternative hypothesis. Alternative hypothesis depends on the sample regression
estimator ( ^
β 1). Hence, H1: ^
β i ≠ 0. Three approaches for hypothesis testing have been discussed in
this reading manual.
1. The standard error test
This approach uses standard errors to test whether the estimators ^

β 1 and ^
β 0 are significantly
different from zero given that population parameters are zero. In fact, it is an approximated test
(which is approximated from the z-test and t-test at 5% level of significance.
Step i: Compute standard error: SE( ^

β i ) = √ var ( ^
β i)
23
The hypothesis:
 𝑇ℎ𝑒 𝑁𝑢𝑙𝑙 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠: H0: ^
βi = 0
 𝑇ℎ𝑒 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠:H1 : ^

βi ≠ 0
Step ii: compare the value of the estimators ( ^

β i) with its Standard Error or (( ^
β i))
1^
Decision rule:- Reject the null hypothesis if: β > SE ¿ )
2 i
Interpretation: if the null βi = 0 is rejected, it means that the explanatory variable (X) associated
with βi affects the dependent variable (Y) significantly.
This approach is also known as: The “Zero” Null Hypothesis and the “2-t” Rule of Thumb. “2-t”
Rule of Thumb: If the number of degrees of freedom is 20 or more and if β, the level of
βî
significance, is set at 0.05, then the null hypothesis βi = 0 can be rejected if the t-value
SE ( ^
βi )
computed exceeds 2 in absolute value.
Example 3: Use the regression equation from Example (2) and the statistics from example (3) to
test the null hypothesis that H 0: ^
β i = 0 and H1: ^
β i ≠ 0. Is the independent variables (X)
statistically and significantly affect the dependent variable (Y) or not?
From example 2 and 3, we have the following information
Yî=24.453+ 0.5091 X
(18.1410) (0.0357) (the values in braces are standard errors of β0 and β1, respectively)
 For β0 :- We have our hypothesis:
 H0: ^
β0 = 0
 H1: ^
β0 ≠ 0
 Decision: Since ½*24.453 > 18.1410 is false, we don’t reject the null hypothesis.
 Interpretation: at 𝜶 = 𝟓% level of significance, the constant term is not statistically
significant in affecting the dependent variable (Y) in the estimated model.
 For β1:-We have our hypothesis:
 H0: ^
β1 = 0
 H1: ^
β1 ≠ 0
 Decision: Since ½*0.5091 > 0.0357 is true, we reject the null hypothesis.
24
 Interpretation: at 𝜶 = 𝟓% level of significance, the variable X is statistically significant in

affecting the dependent variable in the estimated model.
2. The test-of-significance (The Student’s t-test)
In this test, sample results are used to verify the truth or falsity of a null hypothesis (𝐻0) on the
basis of the value of the test statistic obtained from the data at hand. This test is used when we
hypothesize some value for β𝑖 and try to see whether the computed ^
β 1lies within reasonable
(confidence) limits around the hypothesized value of β𝑖. In other words, it is confidence interval
for the estimator under the given hypothesis for the true parameter. We usually use t-test when
population variance is unknown. Under the normality assumption the variable, 𝑋𝑖 can be
transformed into t using the formula:
3. Confidence interval approach
The Coefficient of Determination r2 a Measure of “Goodness of Fit”
In the regression context, r2 is a more meaningful measure than r, for the former tells us the
proportion of variation in the dependent variable explained by the explanatory variable(s) and
therefore provides an overall measure of the extent to which the variation in one variable
determines the variation in the other. The latter does not have such value. Moreover, as we shall
see, the interpretation of r(= R) in a multiple which we shall study in detail in Unit four,
measures this strength of (linear) association. For example, we may be interested in finding the
correlation (coefficient) between smoking and lung cancer, between scores on statistics and
mathematics examinations, between high school grades and college grades, and so on. In
regression analysis, as already noted, we are not primary interested in such a measure. Instead,
we try to estimate or predict the average values of one variable on the basis of the fixed values of
other variables. Thus, we may want to know whether we can predict the average score on a
statistics examination by knowing a student’s score on a mathematics examination
So far we were concerned with the problem of estimating regression coefficients, their standard
errors, and some of their properties. We now consider the goodness of fit of the fitted regression
line to a set of data; that is, we shall find out how “well” the sample regression line fits the data.
25
If all the observations were to lie on the regression line, be some positive . What we hope for
is that these residuals around the regression line are as small as possible. The coefficient of
determination r 2 (two-variable case) or R2 (multiple regression) is a summary measure that tells
how we the sample regression line fits the data.
26
Figure 7: Breakdown of the variation of Yi into two components.
27
Example: measure the goodness of fit, consider our hypothetical data on weekly family
consumption expenditure, Y, and weekly income X.
28

Chapter - Two - Simple Linear Regression - Final Edited

Uploaded by

Copyright:

Available Formats

Chapter - Two - Simple Linear Regression - Final Edited

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter - Two - Simple Linear Regression - Final Edited

Uploaded by

Copyright:

Available Formats

Simple Regression Model

Chapter 2: The Simple Regression Model

Assumptions of the classical linear regression model

Figure1: Conditional Distributions of the Disturbance ui (Gujarati, 2004)

Where, i and j are two different observations, cov() means covariance.

Assumption 6: Zero covariance between ui and Xi or E(ui|Xj) = 0. Finally,

given by: . Where β1 and β2 are estimated by and , respectively, and is

is as small as possible. Although intuitively appealing, this is not a very good

Figure 5: Least-squares criterion

labor hours could be: ==>> Y = 3.6 + 0.75X

weekly income would be ==> Yi = 24.453 + 0.5091X

Recall those equations:- Yi = β0 + β1Xi +ui where, i = 1, 2 … N  PRF (i)

Therefore, what is needed is some measure of “reliability” or precision of the estimators

. In statistics the precision of an estimate is measured by its standard error (3se).

Confidence Intervals and Hypothesis Testing

Take the deviation: YÎ −Yî= β^0 + βÎ X I −¿

y i= β^i xi -------------- (19)

Substituting 𝑌i ̂ from equation (c) in place of “ ^

Taking the mean of both sides: (Σ𝑌𝑖) = (ΣY^i )  Y = Y^i

Y i−Y =Y^i +ui−Y^i Substitute 𝑦𝑖 from equation (18) and ^

𝑦𝑖 = 𝑦̂𝑖 + ui from this, we can derive the following.

= Σ[(𝑢𝑖 − 𝑢̅)2 − 2(𝑢𝑖 − 𝑢̅)( ^

E(Σu𝑖2) = E[Σ(𝑢𝑖− 𝑢̅)2]− E[( ^

Let’s say E[Σ(𝑢𝑖− 𝑢̅)2] = A , E[( ^

Also, recall the equation of ^

1. The standard error test

This approach uses standard errors to test whether the estimators ^

Step i: Compute standard error: SE( ^

 𝑇ℎ𝑒 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠:H1 : ^

Step ii: compare the value of the estimators ( ^

 Interpretation: at 𝜶 = 𝟓% level of significance, the variable X is statistically significant in

3. Confidence interval approach

The Coefficient of Determination r2 a Measure of “Goodness of Fit”

Figure 7: Breakdown of the variation of Yi into two components.

You might also like