Unit I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

UNIT 1

UNIT 1

Introduction to Econometrics

Econometrics deals with the measurement of economic relationships. It is an integration of economics,


mathematical economics and statistics with an objective to provide numerical values to the parameters of
economic relationships. The relationships of economic theories are usually expressed in mathematical forms
and combined with empirical economics. The econometrics methods are used to obtain the values of
parameters which are essentially the coefficients of the mathematical form of the economic relationships.
The statistical methods which help in explaining the economic phenomenon are adapted as econometric
methods. The econometric relationships depict the random behaviour of economic relationships which are
generally not considered in economics and mathematical formulations.

It may be pointed out that the econometric methods can be used in other areas like engineering sciences,
biological sciences, medical sciences, geosciences, agricultural sciences etc. In simple words, whenever there
is a need of finding the stochastic relationship in mathematical format, the econometric methods and tools
help. The econometric tools are helpful in explaining the relationships among variables.

Econometric Models:
A model is a simplified representation of a real-world process. It should be representative in the sense that it
should contain the salient features of the phenomena under study. In general, one of the objectives in
modeling is to have a simple model to explain a complex phenomenon. Such an objective may sometimes
lead to oversimplified model and sometimes the assumptions made are unrealistic. In practice, generally, all
the variables which the experimenter thinks are relevant to explain the phenomenon are included in the
model. Rest of the variables are dumped in a basket called “disturbances” where the disturbances are random
variables. This is the main difference between economic modeling and econometric modeling. This is also
the main difference between mathematical modeling and statistical modeling. The mathematical modeling is
exact in nature, whereas the statistical modeling contains a stochastic term also.

An economic model is a set of assumptions that describes the behaviour of an economy, or more generally, a
phenomenon.
An econometric model consists of
- a set of equations describing the behaviour. These equations are derived from the economic model
and have two parts – observed variables and disturbances.
- a statement about the errors in the observed values of variables.
- a specification of the probability distribution of disturbances.

Aims of econometrics:
The three main aims econometrics are as follows:

1. Formulation and specification of econometric models:


The economic models are formulated in an empirically testable form. Several econometric models can be
derived from an economic model. Such models differ due to different choice of functional form,
specification of the stochastic structure of the variables etc.

2. Estimation and testing of models:


The models are estimated on the basis of the observed set of data and are tested for their suitability. This is
the part of the statistical inference of the modelling. Various estimation procedures are used to know the
numerical values of the unknown parameters of the model. Based on various formulations of statistical
models, a suitable and appropriate model is selected.

3. Use of models:
The obtained models are used for forecasting and policy formulation, which is an essential part in any policy
decision. Such forecasts help the policymakers to judge the goodness of the fitted model and take necessary
measures in order to re-adjust the relevant economic variables.

Econometrics and statistics:


Econometrics differs both from mathematical statistics and economic statistics. In economic statistics, the
empirical data is collected recorded, tabulated and used in describing the pattern in their development over
time. The economic statistics is a descriptive aspect of economics. It does not provide either the explanations
of the development of various variables or measurement of the parameters of the relationships.
Statistical methods describe the methods of measurement which are developed on the basis of controlled
experiments. Such methods may not be suitable for the economic phenomenon as they don’t fit in the
framework of controlled experiments. For example, in real-world experiments, the variables usually change
continuously and simultaneously, and so the set up of controlled experiments are not suitable.

Econometrics uses statistical methods after adapting them to the problems of economic life. These adopted
statistical methods are usually termed as econometric methods. Such methods are adjusted so that they
become appropriate for the measurement of stochastic relationships. These adjustments basically attempt to
specify attempts to the stochastic element which operate in real-world data and enters into the determination
of observed data. This enables the data to be called a random sample which is needed for the application of
statistical tools.

The theoretical econometrics includes the development of appropriate methods for the measurement of
economic relationships which are not meant for controlled experiments conducted inside the laboratories.
The econometric methods are generally developed for the analysis of non-experimental data.

The applied econometrics includes the application of econometric methods to specific branches of
econometric theory and problems like demand, supply, production, investment, consumption etc. The applied
econometrics involves the application of the tools of econometric theory for the analysis of the economic
phenomenon and forecasting economic behavior.

Types of data
Various types of data is used in the estimation of the model.
1. Time series data
Time series data give information about the numerical values of variables from period to period and are
collected over time. For example, the data during the years 1990-2010 for monthly income constitutes a time
series of data.

2. Cross-section data
The cross-section data give information on the variables concerning individual agents (e.g., consumers or
produces) at a given point of time. For example, a cross-section of a sample of consumers is a sample of
family budgets showing expenditures on various commodities by each family, as well as information on
family income, family composition and other demographic, social or financial characteristics.
3. Panel data:
The panel data are the data from a repeated survey of a single (cross-section) sample in different periods of
time.

4. Dummy variable data


When the variables are qualitative in nature, then the data is recorded in the form of the indicator function.
The values of the variables do not reflect the magnitude of the data. They reflect only the presence/absence
of a characteristic. For example, variables like religion, sex, taste, etc. are qualitative variables. The variable
`sex’ takes two values – male or female, the variable `taste’ takes values-like or dislike etc. Such values are
denoted by the dummy variable. For example, these values can be represented as ‘1’ represents male and ‘0’
represents female. Similarly, ‘1’ represents the liking of taste, and ‘0’ represents the disliking of taste.

Aggregation problem:
The aggregation problems arise when aggregative variables are used in functions. Such aggregative variables
may involve.
1. Aggregation over individuals:
For example, the total income may comprise the sum of individual incomes.

2. Aggregation over commodities:


The quantity of various commodities may be aggregated over, e.g., price or group of commodities. This is
done by using suitable index.

3. Aggregation over time periods


Sometimes the data is available for shorter or longer time periods than required to be used in the functional
form of the economic relationship. In such cases, the data needs to be aggregated over the time period. For
example, the production of most of the manufacturing commodities is completed in a period shorter than a
year. If annual figures are to be used in the model, then there may be some error in the production function.

4. Spatial aggregation:
Sometimes the aggregation is related to spatial issues. For example, the population of towns, countries, or the
production in a city or region etc..

Such sources of aggregation introduce “aggregation bias” in the estimates of the coefficients. It is important
to examine the possibility of such errors before estimating the model.
Econometrics and regression analysis:
One of the very important roles of econometrics is to provide the tools for modeling on the basis of given
data. The regression modeling technique helps a lot in this task. The regression models can be either linear or
non-linear based on which we have linear regression analysis and non-linear regression analysis. We will
consider only the tools of linear regression analysis and our main interest will be the fitting of the linear
regression model to a given set of data.

Linear regression model


Suppose the outcome of any process is denoted by a random variable y , called as dependent (or study)

variable, depends on k independent (or explanatory) variables denoted by X1, X 2 ,..., Xk . Suppose the

behaviour of y can be explained by a relationship given by

y  f ( X1, X 2 ,..., Xk , 1, 2 ,..., k )  


where f is some well-defined function and 1,  2 ,..., k are the parameters which characterize the role and
contribution of X1, X 2 ,..., Xk , respectively. The term  reflects the stochastic nature of the relationship

between y and X1, X 2 ,..., Xk and indicates that such a relationship is not exact in nature. When   0, then
the relationship is called the mathematical model otherwise the statistical model. The term “model” is
broadly used to represent any phenomenon in a mathematical framework.

A model or relationship is termed as linear if it is linear in parameters and non-linear, if it is not linear in
parameters. In other words, if all the partial derivatives of y with respect to each of the parameters

1,  2 ,..., k are independent of the parameters, then the model is called as a linear model. If any of the
partial derivatives of y with respect to any of the 1 ,  2 ,..., k is not independent of the parameters, the
model is called non-linear. Note that the linearity or non-linearity of the model is not described by the
linearity or non-linearity of explanatory variables in the model.

For example

y  1X 1  2 X 2  3 log X 3  
2

is a linear model because y / i , (i  1, 2, 3) are independent of the parameters i , (i  1, 2, 3). On the other

hand,
y   2 X   X   log X  
1 1 2 2 3
is a non-linear model because y / 1  21 X1 depends on 1 although y / 2 and y / 3 are independent

of any of the 1 ,  2 or  3 .

When the function f is linear in parameters, then y  f ( X 1 , X 2 ,..., X k , 1 ,  2 ,...,  k )   is called a linear
model and when the function f is non-linear in parameters, then it is called a non-linear model. In general,
the function f is chosen as

f ( X1, X 2 ,..., Xk , 1 , 2 ..., k )  1 X1  2 X 2  ...  k Xk


to describe a linear model. Since X1, X 2 ,..., Xk are pre-determined variables and y is the outcome, so both

are known. Thus the knowledge of the model depends on the knowledge of the parameters 1, 2 ,..., k .

The statistical linear modeling essentially consists of developing approaches and tools to determine
1,  2 ,..., k in the linear model
y  1 X1  2 X 2  ...  k Xk  

given the observations on y and X1, X 2 ,..., Xk .

Different statistical estimation procedures, e.g., method of maximum likelihood, the principle of least
squares, method of moments etc. can be employed to estimate the parameters of the model. The method of
maximum likelihood needs further knowledge of the distribution of y whereas the method of moments and
the principle of least squares do not need any knowledge about the distribution of y .

The regression analysis is a tool to determine the values of the parameters given the data on y and
X1, X 2 ,..., X k . The literal meaning of regression is “to move in the backward direction”. Before discussing

and understanding the meaning of “backward direction”, let us find which of the following statements is
correct:
S1: model generates data or
S 2 : data generates the model.
Obviously, S1 is correct. It can be broadly thought that the model exists in nature but is unknown to the
experimenter. When some values to the explanatory variables are provided, then the values for the output or
study variable are generated accordingly, depending on the form of the function f and the nature of the
phenomenon. So ideally, the pre-existing model gives rise to the data. Our objective is to determine the
functional form of this model. Now we move in the backward direction. We propose to first collect the data
on study and explanatory variables. Then we employ some statistical techniques and use this data to know
the form of function f . Equivalently, the data from the model is recorded first and then used to determine
the parameters of the model. The regression analysis is a technique which helps in determining the statistical
model by using the data on study and explanatory variables. The classification of linear and non-linear
regression analysis is based on the determination of linear and non-linear models, respectively.

Consider a simple example to understand the meaning of “regression”. Suppose the yield of the crop ( y)

depends linearly on two explanatory variables, viz., the quantity of fertilizer ( X1 ) and level of irrigation

( X 2 ) as

y   1 X1   2 X 2   .

There exist the true values of 1 and 2 in nature but are unknown to the experimenter. Some values on y

are recorded by providing different values to X1 and X2 . There exists some relationship between y and

X1 , X 2 which gives rise to a systematically behaved data on y , X1 and X 2 . Such a relationship is unknown
to the experimenter. To determine the model, we move in the backward direction in the sense that the
collected data is used to determine the unknown parameters 1 and 2 of the model. In this sense, such an
approach is termed as regression analysis.

The theory and fundamentals of linear models lay the foundation for developing the tools for regression
analysis that are based on valid statistical theory and concepts.

Steps in regression analysis


Regression analysis includes the following steps:
 Statement of the problem under consideration
 Choice of relevant variables
 Collection of data on relevant variables
 Specification of model
 Choice of method for fitting the data
 Fitting of model
 Model validation and criticism
 Using the chosen model(s) for the solution of the posed problem.
These steps are examined below.

1. Statement of the problem under consideration:


The first important step in conducting any regression analysis is to specify the problem and the objectives to
be addressed by the regression analysis. The wrong formulation or the wrong understanding of the problem
will give the wrong statistical inferences. The choice of variables depends upon the objectives of the study
and understanding of the problem. For example, the height and weight of children are related. Now there can
be two issues to be addressed.
(i) Determination of height for a given weight, or
(ii) determination of weight for a given height.
In case 1, the height is the response variable, whereas weight is the response variable in case 2. The role of
explanatory variables is also interchanged in cases 1 and 2.

2. Choice of relevant variables:


Once the problem is carefully formulated and objectives have been decided, the next question is to choose
the relevant variables. It has to be kept in mind that the correct choice of variables will determine the
statistical inferences correctly. For example, in any agricultural experiment, the yield depends on explanatory
variables like quantity of fertilizer, rainfall, irrigation, temperature etc. These variables are denoted by
X1, X 2 ,..., Xk as a set of k explanatory variables.

3. Collection of data on relevant variables:


Once the objective of the study is clearly stated, and the variables are chosen, the next question arises how to
collect data on such relevant variables. The data is essentially the measurement of these variables. For
example, suppose we want to collect the data on age. For this, it is important to know how to record the data
on age. Then either the date of birth can be recorded which will provide the exact age on any specific date or
the age in terms of completed years as on specific date can be recorded. Moreover, it is also important to
decide whether the data has to be collected on variables as quantitative variables or qualitative variables. For
example, if the ages (in years) are 15,17,19,21,23, then these are quantitative values. If the ages are defined
by a variable that takes value 1 if ages are less than 18 years and 0 if the ages are more than 18 years, then
the earlier recorded data is converted to 1,1,0,0,0. Note that there is a loss of information in converting the
quantitative data into qualitative data. The methods and approaches for qualitative and quantitative data are
also different. If the study variable is binary, then logistic and probit regressions etc. are used. If all
explanatory variables are qualitative, then analysis of variance technique is used. If some explanatory
variables are qualitative and others are quantitative, then analysis of covariance technique is used. The
techniques of analysis of variance and analysis of covariance are the special cases of regression analysis.

Generally, the data is collected on n subjects, then y on data, then y denotes the response or study variable
and y , y ,..., y are the n values. If there are k explanatory variables X , X ,.., X then x denotes the ith
1 2 n 1 2 k ij

value of the jth variable i  1, 2,..., n; j  1, 2,..., k . The observation can be presented in the following table:
Notation for the data used in regression analysis
Observation number Response Explanatory variables
y X1 X 2 X k

1 y1 x11 x12 x1k


2 y2 x21 x22 x2k

n yn xn1 xn 2 xnk

4. Specification of model:
The experimenter or the person working in the subject usually help in determining the form of the model.
Only the form of the tentative model can be ascertained, and it will depend on some unknown parameters.
For example, a general form will be like
y  f ( X1 , X 2 ,..., Xk ; 1, 2 ,..., k )  
where  is the random error reflecting mainly the difference in the observed value of y and the value of y
obtained through the model. The form of f ( X1 , X 2 ,..., X k ; 1,  2 ,...,  k ) can be linear as well as non-linear

depending on the form of parameters 1 , 2 ,..., k . A model is said to be linear if it is linear in parameters.
For example,
y   X   X 2   X  
1 1 2 1 3 2

y  1  2 ln X 2  
are linear models whereas
y   X   2 X   X  
1 1 2 2 3 2

y  ln 1  X1  2 X 2  

are non-linear models. Many times, the non-linear models can be converted into linear models through some
transformations. So the class of linear models is wider than what it appears initially.
If a model contains only one explanatory variable, then it is called a simple regression model. When there
are more than one independent variables, then it is called a multiple regression model. When there is only
one study variable, the regression is termed as univariate regression. When there are more than one study
variables, the regression is termed as multivariate regression. Note that the simple and multiple regressions
are not same as univariate and multivariate regressions. The simple and multiple regression are determined
by the number of explanatory variables, whereas univariate and multivariate regressions are determined by
the number of study variables.

5. Choice of method for fitting the data:


After the model has been defined, and the data have been collected, the next task is to estimate the
parameters of the model based on the collected data. This is also referred to as parameter estimation or
model fitting. The most commonly used method of estimation is the least-squares method. Under certain
assumptions, the least-squares method produces estimators with desirable properties. The other estimation
methods are the maximum likelihood method, ridge method, principal components method etc.

6. Fitting of model:
The estimation of unknown parameters using appropriate method provides the values of the parameter.
Substituting these values in the equation gives us a usable model. This is termed as model fitting. The
estimates of parameters 1, 2 ,..., k in the model

y  f ( X1 , X 2 ,..., Xk , 1 , 2 ,..., k )  
are denoted by ˆ, ˆ,..., ˆ which gives the fitted model as
1 2 k

y  f ( X1, X 2 ,..., Xk , 1 ˆ , 2ˆ ,...,k ˆ ).

When the value of y is obtained for the given values of X1, X 2 ,..., Xk , it is denoted as yˆ and called as fitted

value.

The fitted equation is used for prediction. In this case, yˆ is termed as the predicted value. Note that the
fitted value is where the values used for explanatory variables correspond to one of the n observations in the
data, whereas predicted value is the one obtained for any set of values of explanatory variables. It is not
generally recommended to predict the y -values for the set of those values of explanatory variables which lie
outside the range of data. When the values of explanatory variables are the future values of explanatory
variables, the predicted values are called forecasted values.
7. Model criticism and selection
The validity of the statistical method to be used for regression analysis depends on various assumptions.
These assumptions become the assumptions for the model and the data essentially. The quality of statistical
inferences heavily depends on whether these assumptions are satisfied or not. For making these assumptions
to be valid and to be satisfied, care is needed from the beginning of the experiment. One has to be careful in
choosing the required assumptions and to decide as well to determine if the assumptions are valid for the
given experimental conditions or not? It is also important to decide that the situations is which the
assumptions may not meet.

The validation of the assumptions must be made before drawing any statistical conclusion. Any departure
from the validity of assumptions will be reflected in the statistical inferences. In fact, the regression analysis
is an iterative process where the outputs are used to diagnose, validate, criticize and modify the inputs. The
iterative process is illustrated in the following figure.

Inputs Outputs

 Theories Estimate  Estimation of parameters


 Model  Confidence regions
 Assumptions  Tests of hypotheses
 Data Diagnosis,  Graphical displays
 Statistocal methods validation and
criticism

8. Objectives of regression analysis


The determination of the explicit form of the regression equation is the ultimate objective of regression
analysis. It is finally a good and valid relationship between study variable and explanatory variables. The
regression equation helps in understanding the interrelationships of variables among them. Such a
regression equation can be used for several purposes. For example, to determine the role of any explanatory
variable in the joint relationship in any policy formulation, to forecast the values of the response variable
for a given set of values of explanatory variables.
Methodology for Econometrics Research Analysis

Methodology of Econometrics / Hypothesis Testing

Hypothesis is a statement or assumption that is yet to be proved.

Simple Hypothesis: When a hypothesis specifies all the parameters of a probability distribution, it is known
as simple hypothesis.
Composite Hypothesis: When a hypothesis specifies only some of the parameters of a probability
distribution, it is known as composite hypothesis.
1. Statement of Null Hypothesis:
Null Hypothesis - Ho :
For applying the test of significance we first set up a hypothesis-a definite statement about the population
parameters. Such a hypothesis which is usually a hypothesis of no-difference is called null hypothesis and it
is denoted by Ho
Example: Ho- There is no significant difference between the two sample means
Alternate Hypothesis- H1:
Any hypothesis which is complementary to the null hypothesis is called an alternative hypothesis,
usually denoted by H1.

Example: There is a significance difference between two sample means.


Symbolically,
H1: μ1≠μ2 (two sided or directionless alternative)
If the statement is that A gives significantly less than B (or) A gives significantly
more yield than B.
Symbolically,
H1: μ1 < μ2 (one sided alternative-left tailed)
H1: μ1 > μ2 (one sided alternative-Right tailed)
2. Specification of the Mathematical model:
This is where the algebra enters. We need to use mathematical skills to produce an equation.
Assume a theory predicting that more schooling increases the wage.
In economic terms, we say that the return to schooling is positive.
The equation is:
Y=β1+β2X
3. Specification of Econometric model:
Here, we assume that the mathematical model is correct but we need to account for the fact that it may
not be so. We add an error term, u to the equation above.
It is also called a random (stochastic) variable.
The econometric equation is
Y=β1+β2X+u

4. Data collection:
Data : The information collected through censuses and surveys or in a routine manner or other sources is
called a raw data.
There are two types of data
1. Primary data
2.Secondary data.
5. Estimation of the Econometric Model:
Here, we quantify β1 and β2 i.e. we obtain numerical estimates. This is done by statistical technique called
regression analysis.
Example:
Y=12.50+0.6X+u
6. Testing of Hypothesis:
Once the hypothesis is formulated we have to make a decision on it. A statistical
procedure by which we decide to accept or reject a statistical hypothesis is called testing of
hypothesis.
7. Forecasting and Prediction:
If the hypothesis testing was positive, i.e. the theory was concluded to be correct, we forecast the values of
the wage by predicting the values of education.
Example:
Y=12.50+0.6X
X=10 means then Y=18.50 it is Forecasting
Y=20 means then X=14. It is prediction.

8. Use for policy Recommendation:


Lastly, if the theory seems to make sense and the econometric model was not refuted on the basis
of the hypothesis test, we can go on to use the theory for policy recommendation.

Example: Using the model for agricultural Polices.


UNIT II
UNIT II

STOCHASTIC TERM:-
The stochastic term is situations or models containing a random element, hence unpredictable and without a
stable pattern of order. All-natural events are a stochastic phenomenon.
Reason to incorporate the stochastic term
(1) Omission or ignorance of variables from the model -
In reality, it is not only the N application that determines the crop yield. The yield of the crop on the farm is
determined by many other factors. Such as level of other nutrients, Soil, moisture, weather, insects, & pests
agronomic, practices. All these factors are ignored in the model. The disturbance term Ui is use for these
variables.
(2) Non-availability of data in statistical form-
In reality, sometimes we may not have quantitative or statistical information about these variables so we add
disturbance term on the model.
Ex.- It is difficult to have a composite measure of temperature & humidity which existed at each stage of
cop growth beginning from sowing till harvesting.
(3) Joint influence-
Same factors, when taken individually, have a very small influence on the dependent variable. Thus, their
influence cannot be measured in a reliable way. But it is quite possible that the joint influence of all such
variables when taken together may effect on the dependent variable. So a disturbance term is used.

(4) Randomness in the human behaviors-


Even if we succeed in considering all the relevant variables into the model these are bound to be the
same deviations from the normal behavior pattern depicted by the model. This randomness in the behavior
or nature is unexplainable so the Ui term take care of such randomness.
(5) Errors of measurement-
Such errors are inevitable during collection & processing of the date on the variable. for all the above
reasons, the Ui term assumes an extremely important role in population regression.
(6) Imperfect specification of the mathematical form of the model-
Many times post knowledge does not indicate the correct form of the mathematical model in such
case the researcher may assume a particular form of mistake & reason indicate stochastic term.
Assumption about the disturbance term
(1) The regression model is linear in parameters - An example of model equation that is linear in
parameters
Y = a + (β1*X1) + (β2*X22)
Though, the X2 is raised to power 2, the equation is still linear in beta parameters. So the assumption is
satisfied in this case.
(2) The error term is normally distributed - The random variable Ui is assumed to behave a normal
distribution which means that Ui is normally distributed around zero-mean & constant variance. If the
maximum likelihood method (not OLS) is used to compute the estimates, this also implies the Y and the Xs
are also normally distributed.
(3) The mean of residuals is zero (Zero mean of Ui) - This assumption is tell us the each Ui corresponding
to Xi (independent variable) may achieve variable values which have a zero mean.
E (Ui) = 0

(4) The error term has a constant variance (Homoscedasity) –


Var (Ui) = E(Ui2) = 𝛿 2
This assumption states that the variance of each. Xi is some positive constant value equal to 𝛿 2. The
variance of the errors should be consistent for all observations. In other words, the variance does not change
for each observation or for a range of observations. This preferred condition is known as homoscedasticity
(same scatter). If the variance changes, we refer to that as heteroscedasticity (different scatter).

(5) No autocorrelation of residuals –


Cov (Ui,Ui2) = 0
This assumption states that any two values of Ui term (Ui & Ui2) are not linearly correlated. This is
applicable especially for time series data. Autocorrelation is the correlation of a time Series with lags of
itself. When the residuals are autocorrelated, it means that the current value is dependent of the previous
(historic) values and that there is a definite unexplained pattern in the Y variable that shows up in the
disturbances.
(6) All independent variables are uncorrelated with the error term (Non - autocorrelation between Xi
& Ui)
Cov (Xi, Ui)= 0 = 1, 2. . . . . . .h
If an independent variable is correlated with the error term, we can use the independent variable to predict
the error term, which violates the notion that the error term represents unpredictable random error.
(7) No independent variable is a perfect linear function of other explanatory variables: There is no
perfect linear relationship between explanatory variables.
(8) The number of observations must be greater than number of Xs
(9) The variability in X values is positive: This means the X values in a given sample must not all be the
same.
(10) The regression model is correctly specified: This means that if the Y and X variable has an inverse
relationship, the model equation should be specified appropriately: Y= β1+β2∗(1/X)

Properties of OLS method


1. Unbiasedness:- biasness of an estimator is defined as Difference Between Expected Value of the
Parameter and True Value.
E (b) = β
An estimator is unbiased when its biasness is zero it means unbiased estimator convergence to true value of
parameter as the no. of sample increase.

2. Least of minimum variance:-


An estimate best when it has smallest variance as compared to any other estimator obtained or achieved.
Var b̂. = E [ b̂ − E ( b) ]2 < 𝐸 [b̂ − E(b)]
Low variances most derisible when it combined with small biasness otherwise minimum variance and
unbiasedness has no meaning.

3. Efficient estimator- An estimator is efficient when it has both above property the unbiasedness & law
variance as compared to other unbiased estimator.

4. Linear Estimator- an estimator is linear if it is a linear function of the sample observation


=: If it is determinate by a linear combination of the sample data.
Y = a+b1X1+b2X2+……….+bnXn

5. BLUE (Best Linear Unbiased Estimator)-


An estimator is BLUE when it is linear unbiased & has smallest variance as compared to other linear
unbiased estimator of the true parameter.
6. Least Mean Square Error-
The minimum mean square error criteria is combination of the unbiasedness and minimum variance
properties.
Expected value of the square deferent of the estimator around the true population parameter.
M &E (𝑏̂) = Var (𝑏̂) +Bias2 (𝑏̂)

7. Sufficient Estimator-
Sufficient estimator is estimator which utilize all the information sample contains about the true parameters.
It must use all the observations of the sample mean in their way that is no other estimator can add any
further information about the population parameter which is being estimated.

Simple Linear Regression Analysis

The simple linear regression model


We consider the modelling between the dependent and one independent variable. When there is only one
independent variable in the linear regression model, the model is generally termed as a simple linear
regression model. When there are more than one independent variables in the model, then the linear model is
termed as the multiple linear regression model.
The linear model
Consider a simple linear regression model

where y is termed as the dependent or study variable and X is termed as the independent or explanatory
variable. The terms β0 and β1 are the parameters of the model. The parameter β0 is termed as an intercept
term, and the parameter β1 is termed as the slope parameter. These parameters are usually called as
regression coefficients. The unobservable error component ε accounts for the failure of data to lie on the
straight line and represents the difference between the true and observed realization of y . There can be
several reasons for such difference, e.g., the effect of all deleted variables in the model, variables may be
qualitative, inherent randomness in the observations etc. We assume that ε is observed as independent and
identically distributed random variable with mean zero and constant variance ꝺ2 . Later, we will additionally
assume that ε is normally distributed.
The independent variables are viewed as controlled by the experimenter, so it is considered as non-stochastic
whereas y is viewed as a random variable with
E( y) = β0 + β1 X
and
Var( y) = ꝺ2.

Ordinary Least Squares Estimation

Assuming that a set of n paired observations on (xi , yi ), i = 1, 2,..., n are available which satisfy the linear

regression model y = β0 + β1 X +ε
So we can write the model for each observation as, yi = β0 + β1 Xi +εi (i = 1,2,...,n)
The direct regression approach minimizes the sum of squares

The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of β0 and β1.
This gives the ordinary least squares estimates b0 of β0 and b1 of β1 as
Multiple Linear Regression Model
Linear regression is a linear approach to modelling the relationship between a scalar response (or dependent
variable) and one or more explanatory variables (or independent variables). The case of one explanatory
variable is called simple linear regression. For more than one explanatory variable, the process is called
multiple linear regression. This term is distinct from multivariate linear regression, where multiple
correlated dependent variables are predicted, rather than a single scalar variable.
Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that
uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear
regression (MLR) is to model the linear relationship between the explanatory (independent) variables and
response (dependent) variable. In essence, multiple regression is the extension of ordinary least-squares
(OLS) regression that involves more than one explanatory variable.

The multiple regression model is based on the following assumptions:

 There is a linear relationship between the dependent variables and the independent variables.
 The independent variables are not too highly correlated with each other.
 yi observations are selected independently and randomly from the population.
 Residuals should be normally distributed with a mean of 0 and variance σ.
The coefficient of determination (R-squared) is a statistical metric that is used to measure how much of the
variation in outcome can be explained by the variation in the independent variables. R2 always increases as
more predictors are added to the MLR model even though the predictors may not be related to the outcome
variable.

R2 by itself can't thus be used to identify which predictors should be included in a model and which should
be excluded. R2 can only be between 0 and 1, where 0 indicates that the outcome cannot be predicted by any
of the independent variables and 1 indicates that the outcome can be predicted without error from the
independent variables.

When interpreting the results of a multiple regression, beta coefficients are valid while holding all other
variables constant ("all else equal"). The output from a multiple regression can be displayed horizontally as
an equation, or vertically in table form.

You might also like