Unitb - II - Linear Probability, Logit and Probit

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

YANGON INSTITUTE OF ECONOMICS

DEPARTMENT OF STATISTICS

LINEAR PROBABILITY, LOGIT AND PROBIT


MODELS IN QUALITATIVE DATA ANALYSIS

THIDA THAN
M. Econ (Statistics)
(Roll No. 1)

MARCH 2010
iv
CONTENTS
ACKNOWLEDEGMENTS
ABSTRACT
ABBREVIATIONS
Chapter Page
Chapter I INTRODUCTION 1
Chapter II MODEL SPECIFICATION AND ESTIMATION 3
2.1 Linear Probability Model (LPM) 3
2.1.1 Function Form 3
2.1.2 Examination of the Assumption of ui 3
2.1.3 Estimation 5
2.2 Logit Model 7
2.2.1 Functional Form 7
2.2.2 Features 8
2.2.3 Estimation 9
2.3 Probit Model 13
2.3.1 Functional Form 13
2.3.2 Estimation 14
2.4 Comparison of Models 15
Chapter III DIAGNOSTIC STATISTICS FOR QUALITATIVE 17
RESPONSE MODELS
3.1 Z Statistic 17
3.2 Likelihood Ratio (LR) Statistic 17
3.3 R2 Statistic 17
3.4 Predictive Quality 19
3.5 Analysis of Residuals 20
3.5.1 Standardized Residuals and Consequences of
Heteroscedasticity
3.5.2 Likelihood Ratio Test for Heteroscedasticity
3.5.3 Largrange Multiplier Test for
Heteroskedasticity
v
Chapter Page
Chapter IV APPLICATION OF LINEAR PROBABILITY, LOGIT 23
AND PROBIT MODELS
4.1 Introduction 23
4.2 Models for Child's Weight Colour 23
4.3 Results 25
Chapter V CONCLUSION 28
REFERENCES 30
CHAPTER 1
INTRODUCTION
There are several methods for measuring the relationship among economic
variables. The simplest methods are correlation analysis and regression analysis.
Regression analysis was first developed by Sir Francis Galton who was a well known
British anthropologist and meteorologist in the latter part of the 19th century. It is a
statistics methodology that utilizes the relation between two or more variables so that
one variable can be predicted from the other, or others. This methodology is widely
used in businesses, social and behavioral sciences, biological sciences, and many other
disciplines.
Many regression models in which the regressand, the dependent variable, or the
response variable, say Y, is quantitative, whereas the explanatory variables are either
quantitative (or dummy), or a mixture thereof. In much research work, the researchers
often face situations where the dependent variable of interest is a qualitative in nature.
The dependent variable of interest or regressand,Y, may be two or three or multiple
possible qualitative outcomes. The models in which the dependent variable or
regressand, Y, is qualitative variable are called qualitative response models. These
models are valuable in the analysis of survey data. The simplest possible qualitative
response regression model is the binary model in which the regressand, has only two
possible qualitative outcomes, and therefore can be represented by a binary indicator
variable taking on values 0 and 1. So the regressand can be said that a binary or
dichotomous variable and the models developed for such situations are called binary
response models.
Both theoretical and empirical considerations suggest that when the response
variable is binary, the shape of the response function will frequently be curvilinear.
The shape of this response function is a titled S or as a reverse titled S, and they are
approximately linear except at the ends. These response functions are often referred to
as sigmoidal.
In a model where Y is quantitative, the objective is to estimate its predicted, or
mean value given the values of the regressors, that is, E(Yi∣X1i, X2i, X3i,……,Xki),

where the X's are regressors, may be quantitative or qualitative or both. In models
where Y is qualitative, the objective is to find the probability of something happening.
2

Hence, qualitative response regression models are often known as a type of probability
models. Qualitative response models have been extensively used in biometric
applications for a much longer time than they have used in economic applications.
Among the qualitative response models, linear probability, logit and probit
(also known as normit) models are studied in this paper. The objectives of this paper
are to study;
(1) how to develop the qualitative response models;
(2) how to estimate the qualitative response models;
(3) how to evaluate the qualitative response models;
Firstly, the natures of qualitative response models are introduced in Chapter I.
The specification and estimation procedure of the qualitative response models are
discussed in Chapter II. Then, in Chapter III, diagnostic statistics for qualitative
response models are discussed and, the applications of the models are studied in
Chapter IV. Finally, the important characteristics of the models and findings are
summarized in Chapter V.
3

CHAPTER II
MODEL SPECIFICATION AND ESTIMATION
In this Chapter some of the qualitative response models are considered for a
binary response variable. Among the binary response models, linear probability, logit,
and probit (normit) models are discussed in the following sub-sections.

2.1 Linear Probability Model (LPM)


2.1.1 Functional Form
The functional form of a linear probability model can be expressed as
Yi = β1 + β2 Xi + ui (2.1.1)
where Yi = 1 if the event occurs and
= 0 if the event does not occur
β1 and β1 are regression coefficients. ui is a random error term. Xi is the
predictor variable.
It can be extended to more than one predictor variable.
That is,

Yi = β1 + β2 Xi2 + β3 Xi3+…………..+ βk Xik + ui (2.1.2)


Y = Xβ+ u (2.1.3)

Assume that the model contains a constant term, that is, Xi1 = 1 for all
individuals. The regression coefficient is interpreted in terms of the probability of
being in the interest category on Y. Hence, β2 represents the change in he probability
for each unit increase in Xi, net of the other covariates, and so on.

2.1.2 Examination of the Assumption of ui


Assuming E(ui) = 0, the conditional expectation of Yi given Xi is obtained as:

E(Yi Xi)= β1 + β2Xi2 + β3Xi3 +…………..+ βk Xik = ∑   (2.1.4)


= x i' β
If i is the probability that Yi =1 (that is, the event occurs), and ( 1- i) is the
probability that Yi = 0 (that is, the event does not occur), then the variable Y follows
Bernoulli probability distribution. The expectation of Y is obtained as
4

E(Yi) = 1. i + 0 (1- i) = i (2.1.5)


= Pr (Yi=1)

Comparing Equation (2.1.4) with Equation (2.1.5), the conditional expectation


of the model (2.1.2) can be interpreted as the conditional probability of Y. That is,

E (Yi Xi) = β1 + β2Xi2+ β3Xi3 +…………..+ βk Xik


= i
= Pr (Yi=1)

Since the probability I must lie between 0 and 1 , this is a restriction.


That is, 0≤ E (Yi Xi) ≤1.
Then the disturbances (ui) also take only two values; that is , they follow the
Bernoulli Distribution.
Yi ui Pr (Yi)
1 1- xi' β i
2 - x i' β 1- i
1

Obviously,ui cannot be assumed to the normally distributed; the follow the


Bernoulli distribution. The OLS point estimators still remain unbiased. Besides, as the
sample size increases indefinitely, statistical theory shows that the OLS estimators
tend to be normally distribute generally .As a result, in large samples the statistical
inference of the LPM will follow the usual OLS procedure under the normality
assumption.
Even if E(ui) = 0 and Cov (ui, uj) = 0 for i = j (i.e., no serial correlation), it can
no longer be maintained that in the LPM the disturbances are homoscedastic.
As statistical theory shows that for a Bernoulli distribution the theoretical mean
and variance are, respectively, I and (1- i), where I is the probability of success
(i.e., something happening) showing that the variance is a function of the mean. Hence
the error variance is heteroscedastic. The variance of the error term is

Var (ui) = I (1- i).


5

That is, the variance of the error term in the LPM is heteroscedastic.
Since i = E (Yi ∣Xi) = ∑ βk Xik the variance of ui ultimately depends on the values
of X and hence is not homoscedastic.

2.1.3 Estimation
For a model with heteroscedastic error disturbances it can be assumed that each
2
error term ui is normally distributed with variance i , where the variance Var (ui) = E
(ui2) = i
2
is not constant over observations. When heteroscedasticity is present,
ordinary least squares estimation places more weight on the observations with large
error variances than on those with small error variances. In the presence of
heteroscedasticity, the OLS estimators, although unbiased, are not efficient; that is,
they do not have minimum variance. If the heteroscedasticity is present, the
appropriate estimation technique is the weighted least-squares estimation procedure,
which can be derived from the maximum likelihood function.
Consider the simple linear probability model

Yi = β1 + β2Xi + ui ; where V (ui) =  . (2.1.1)

By minimizing the expression where the original variables are written in


deviation form, the appropriate estimation can be obtained as

∑   /
 = ∑  /

∑(  / )(/  )
= ∑(  / )

∑ ∗∗ 
= ∑( ∗ ) where ∗ =  , ∗ = 
 

To use weighted least-squares, the variables in the original regression model of


Equation (2.1.1) are redefined as;

 
Yi* =  , Xi* = 

, ui* = 
 
6

 
where Var (∗ ) = Var (  ) = Var (ui)
 


= 


=1
Now, the new error term is homoscedastic.
Since there are many situations in which the relative magnitude of the error
variances is not known, it is important to consider special cases in which sufficient
sample information is available to make reasonable guesses of the true error variances.
One possibility is the existence of existence of a relationship between the error
variances and the values of explanatory variable in the regression model. Specifically,
assume that
Var (ui) = 

where C is a nonzero constant and Xi is an observation of the independent variable in


the linear probability model.
If the variances are unknown, the variables in the above equation can be
transformed as;
  
∗ = , Xi* = , ui * =
  

Where Var (∗ ) = Var ( )



= Var (ui)


=  Var (ui)



=  


=C
Now, error term ∗ is homoscedastic.
The LPM is plagued by problems, such as
(1) non – normality of ui
(2) heteroscedasticity of ui
 lying outside the 0-1 range, and
(3) possibility of 
(4) the generally lower R2 values.
7

But these problems are surmountable.


As mentioned above, WLS can be used to resolve the heteroscedasticity
problem or increase the sample size to minimize the non-normality problem. By
resorting to restricted least-squares or mathematical programming techniques the
estimated probabilities can be made to lie in the 0-1 interval.
But even then the fundamental problem with the LPM is that it is not logically
a very attractive model because it assumes that i =E (Y = 1 X) increases linearly with
X, that is the marginal or increment effect of X remains constant throughout.
Therefore, what we need is a (probability) model that has these two features;
(1) as Xi increases , i =E(Y = 1 Xi) increases but never steps outside the 0-1
interval , and
(2) the relationship between i and Xi is nonlinear, that is "one which
approaches zero at slower rates as Xi gets small and approaches one at
slower and slower rates as Xi gets very large.

2.2 Logit Model


Both theoretical and empirical considerations suggest that when the response
variable is binary, the shape of the response function will frequently be curvilinear.
The response functions are shaped either as a title S or a reverse titled S and that they
are approximately linear except at the ends. These response functions are often
referred to as sigmoid. They have asymptotes at 0 and 1 and thus automatically meet
the constraints on E (Y).
The commonly used non-linear probability models are logit and probit models.
The two distributions most often employed are the standard normal distribution and
the standard logistic distribution. The standard normal distribution employed can be
called as probit and the standard logistic distribution, as logit.

2.2.1 Functional Form


The simple logit model is expressed as

 ! (∑ "# $# )
i =
% ! (∑ "# $# )
8

 ! ( & ")
i = (2.2.1)
% ! ( & ")

Letting Zi = ∑  X()

* +
i =
%* + 

= (2.2.2)
%*,+

2.2.2 Features
The features of the logit model are as follows;
(1) Logistic regression effects can be expressed in terms of percent changes in
the odds. Odds ratios are useful in estimating changes in the probability of
event occurrence with changes in predictors once a baseline probability has
been calculated.
-./
i =
%-./

-./
1- i = 1 -
%-./

%-./ , -./
=
%-./

= (2.2.3)
%-./
The ratio of Equation (2.2.2) to (2.2.3)

01 -./ 
=( ./
)/( ) (2.2.4)
%01 %- %-./

=- 2
01
can be called the odds ratio.
%01
9

Take the natural log of Equation (2.2.4)


π
Li = ln (1+πi )
i

= Zi
= ∑  X() (2.2.5)

The logit L goes from -7 to + 7 as  goes from 0to1. That is, although the
probabilities (of necessity) lie between 0 and 1, the logits are not so bounded.
(2) Although L is linear in X, the probabilities themselves are not. This property is
in contrast with the LPM model where the probabilities increase linearly with
X.
(3) If L, the logit, is positive, it means that when the value of the regressor (s)
increases, the odds that the regressand equals 1 (meaning some event of interest
happens) increases . It L is negative,the odds that the regressand equals 1
decreases as the value of X increases. To put it differently, the logit becomes
negative and increasingly large in magnitude as the odds ratio decreases from 1
to 0 and becomes increasingly large and positive as the odds ratio increases
from 1 to infinity.
(4) More formally, the interpretation of the logit model given in Equation (2.2.4) is
as follows;  2, the slope, measures the change in L for a unit change in X. The
intercept  1 is the value of the log odds in favor of occurring an event if the
other event does not occur (or) is zero.
(5) If we actually want to estimate not the odds in favor of event but the
probability of event itself, this can be done directly from Equation (2.2.2) once
the estimates of  1 and  2 are available.
(6) Whereas the LPM assumes that i is linearly related to Xi, the logit model
assumes that the log of the odds ratio is linearly related to Xi.

2.2.3 Estimation
A logistic response function is either monotonic increasing or
monotonic decreasing, depending on the sign of the slope coefficients. It can be
linearized easily. Logistic response functions, like the other response functions which
have been considered are used for describing the nature of the relationship between
the mean response and one (or more) predictor variable (s). They are also used for
10

making predictions. The weighted least squares and maximum likelihood estimation
procedures can be used to estimate the parameters of the logistic response function.
For estimation purposes, consider Equation (2.2.5), that is

πi
Li = ln (
1+πi
)
= ∑  X() (2.2.6)
In estimating the above equation, Logit , Li depends on the two types of data
which are categorized by
(1) data at the individual, or micro level, and
(2) grouped or replicated data

Individual data
Let i = 1 if the event occurs

i = 0 if the event does not occur.


If these values put directly into the logit Li, it is obtained as

Li = ln ( ) if an event occurs
8
8
Li = ln ( ) if an event does not occur.

Obviously, these expressions are meaningless. Therefore, if the data are
situated at the micro, or individual level, the model cannot be estimated by the
standard OLS routine. In this situation, maximum likelihood method can be used to
estimate the parameters. This method is well suited to deal with the problems
associated with the responses Yi being binary. Instead of using the normal distribution
for the binary random variable Y, Bernoulli distribution will be used to develop the
joint probability function of the sample observations.
Since each Yi observation is an ordinary Bernoulli random variable, where;
P(Yi = 1) = i

P(Yi = 0) =1- i

It's probability distribution is represented as follows;


fi (Yi) =   (1-  ),9 ; Yi = 0, 1, ; i =1…….,n (2.2.7)
11

Here, fi(1) = i and

fi (0) = (1- i)


Hence, fi(Yi) simply represents the probability that Yi = 1 or 0
Since the Yi observations are independent, their joint probability function is;

g(Y1,……….,Yn) = ∏;< fi (Yi)



= ∏;<   (1-  ),9 (2.2.8)
Again, it will be easier to find the maximum likelihood estimates by working
with the logarithm of joint probability function:

,9
Logeg(Yi,……….,Yn) = loge∏;< / 9 ( 1- / )

 
= loge∏;<( / ) / (1-  )
1+ /


= ∑;< Yi loge (1−/ ) + ∑;< log  (1-  ) (2.2.9)
/

Since E(Yi) =  for a binary variable, it follows from Equation (2.2.1), and
according to Equation (2.2.5), the above Equation (2.2.9) can be expressed as follows:
LogeL() = ∑;< Y( (∑   ) - ∑;< log  [1+exp(∑   )] (2.2.10)
where L() replaces g(Y1,……….,Yn) to show explicitly that function can be viewed
as the likelihood function of the parameters to be estimated, given the sample
observation.
Equation (2.2.10) can be expressed more clearly as follows;

Log(L()) = ∑;< Y( log(  ) + ∑;<(1 −  ) log (1 −  )


= ∑;< Y( log(F(C ))+ ∑;<(1 −  ) log (1 − (F(C ))
= ∑;; <8 log (1-F(C ))+ ∑;; < log (1 − (F(C )) (2.2.11)

The maximum likelihood estimates of  in the logistic regression model are


those values of  that maximize the log-likelihood function in Equation (2.2.10). No
12

closed-form solution exists for the values of  in Equation (2.2.10) that maximize the
log likelihood function. There are many widely used numerical search procedures; one
of these employs iteratively reweighted least squares.
Once the maximum likelihood estimates are found, these values are
substituted into the response function in Equation (2.2.1) to obtain the fitted response
function.
The fitted logit model is as follows;
 !(∑ G# $#)
F = (2.2.12)
% !(∑ G# $#)

If the logit transformation is utilized in Equation (2.2.5), the fitted response


function in Equation (2.2.11) can be expressed as follows;
H = ∑ I  (2.2.13)
where,
LK
J
H = In( ) (2.2.14)
(,JLK)

Once the fitted logit model has been obtained, the usual next steps are to
examine the appropriateness of the fitted response function and , if the fit is good, to
make a variety of inferences and predictions.

Grouped or replicated data


let Ni = total number of observations
ni = no. of possibility among the interest category (ni ≤ Ni)
Therefore,  can be estimated as
ni
F =
Ni

that is, the relative frequency can be used as an estimate of the true 
corresponding to each Xi. If Ni is fairly large, F will be a reasonably good estimate of

π i

Using the estimated 


F , the estimated logit can be obtained as
LK
J
H = In O + 
= OXi2 + 
OP Xi3 +…+
O Xik
LK
,J

which will be a fairly good estimate of the true logit Li if the no. of observations Ni at
each Xi is reasonably large.
13

If Ni is fairly large and if each observation in a given Xiis distributed


independently as a binomial variable, then

ui ~ N [o, ]
Q J( ,JR )

that is, ui follows the normal distribution with zero mean and variance equal to 1/[Nii
(1-i)]. Therefore, as in the case of LPM the disturbance term in the logit model is
hetroscedastic. Thus, instead of OLS the weighted lest squares (WLS) should be used .
For empirical purposes, replace the unknown  by F and use

S2 = as estimator of 2
L(,J
Q J K L)K

To resolve the problem of heteroscedasticity, Equation (2.2.6) can be


transformed as ]
TW( L( =  √Wi +  √Wi X1i + P √Wi X2i + ………+  √Wi Xki + TW( u( (2.2.15)

L∗ =  √Wi +  X∗2/ + P X∗3/ +……….+  X∗\/ + Vi (2.2.16)


where the weights Wi = NiF (1-F );
L∗ = transformed or weighted Li; X∗ = transformed or weighted Xi; and
vi = transformed error term.
Now, the transformed error term vi is homoscedastic. Estimate Equation (2.2.14) by
OLS recall that WLS on the transformed data.

2.3 Probit Model


The model that emerges from the normal cumulative distribution function
(CDF) is popularly known as the probit model, although sometimes it is also known as
the normit model.

2.3.1 Functional Form


To motivate the probity model, assume that the decision of an event will occur
or not depends on an unobservable utility index Ii, that is determined by one or more
explanatory variables, in such a way that the larger the value of the index Ii, the greater
the probability of occurrence of an event.
The index Ii can be expressed as

Ii = ∑   (2.3.1)
14

Let Yi = 1 if the event occurs and


= 0 if the event does not occur.
Now it is reasonable to assume that there is a critical or threshold level of the
index, call it I∗ such that if Ii exceeds I∗ , the event will occur, otherwise it will not.
The threshold, I∗ , like Ii, is not observable, but it is assumed to be normally distributed
with the same mean and variance it is possible not only to estimate the parameters of
the index given in Equation (2.3.1) but also to get some information about the
unobservable index itself.
Under the assumption of normality, the probability that I∗ is less than or equal
to Ii can be computed from the standard normal cumulative distribution function. That
is,
i =P(Y = 1∣ X) = P (I∗ ≤ Ii) = P(Zi ≤ ∑   ) = F ( ∑   )

= F (x∗ ) (2.3.2)
where P(Y = 1∣ X) means the probability that an event occurs given the value
2
(s) of the X, or explanatory variable(s), i.e Z~(0, ).
F is the standard normal cumulative distribution function. The functional form
of the probity model in two- variable case is.
 b ,2 a
F(Ii) =

_ -
J ,c
`2

 ∑ " $# ,2 a
= = √ J _,c # - (2.3.3)

where
Ii = ∑  

= unobservable utility index (latent variable)


To obtain information on Ii, the utility index, as well as on  take the inverse of
Equation (2.2.3) to obtain:
Ii = F-1 (Ii)
= F-1(  )
= ∑  

Where F-1 is the inverse of the normal cumulative distribution function.


15

2.3.2 Estimation
Once the estimated Ii was obtained, estimating  are relatively straightforward.
Since the normal equivalent deviate (n.e.d) or Ii will be negative whenever  < 0.5, in
practice the number 5 is added to the n.e.d and the result is called a probit. Probit
model is also constructed by assuming that a particular density underlies the data.
Hence, this model is typical estimated using maximum likelihood rather than least
squares.
Data for the probit model may also be two types. They are
(a) grouped data and
(b) ungrouped or individual data
As in the case of the logit model, a nonlinear estimating procedure based on the
method of maximum likelihood can be used to estimate the probit model.

2.4 Comparison of the Models


In the LPM, the slope coefficients measure directly the change in the
probability of an event occurring as the result of a unit change in the value of a
regressor, with the effect of all other variables held constant. In the logit model the
slope coefficient of a variable gives the change in the log of the odds associated with a
unit change in that variable, again holding all other variables constant. But as noted
previously, for the logit model the rate of change in the probability of an event
happening is given by  j  (1-  ), where  jis (the partial regression) coefficient of the
jth regrerssor. But in evaluating  , all the variables included in the analysis are
involved.
In the probit model, the rate of change in the probability is somewhat
complicated and is given by  jf(Zi) where f(Zi) is the density function of the standard
normal variable and ∑   , that is, the regression model used in the analysis.
Thus, in both logit and probit models all the regressors are involved in
computing the changes in probability, whereas in the LPM only the jth regressor is
involved. This difference may be one reason for the early popularity of the LPM
model. One advantage of the LPM over logit or probit is that estimates of coefficients
are available under complete or quasi complete separation.
16

The linear probability model has disadvantage. It places implicit restrictions on


the parameters , as P(Yi = 1) = E (Yi) = xC  requires that 0≤ xC  ≤1 for all i =
1,……..,n. Further, the error terms ui are not normally distributed. This is because the
variable yi can take only the values zero and one, so that ui is a random variable with
discrete distribution given by
ui = 1 - xC  with probability xC 
ui = -xC  with probability 1-xC .
The distribution of ui depends on xi and has variance equal to Var (ui) = xC (1-
xC ), so that the error terms are heteroskedastic with variances that depends on . The
assumption that E (ui) = 0 implies that OLS is an unbiased estimator of  (provided
that the regressors are exogenous), but clearly it is not efficient and the conventional
OLS formulas for the standard errors do not apply. Further, if the OLS estimates b are
used to compute the estimated probabilities ef [yi=1] = xC I, then this may give
valuessmaller than zero or larger than one, in which case they are not real
'probabilities'. This may occur because OLS neglects the implicit restrictions 0≤
xC  ≤1.
In most applications logit and probit models are quite similar, the main
difference being that the logistic distribution has slightly fatter tails. That is to say, the
conditional probability  approaches zero or one at a slower rate in logit than in probit.
Therefore, there is no compelling reason to choose one over the other. In practice
many researchers choose the logit model because of its comparative mathematical
simplicity.
Though the models are similar, one has to be careful in interpreting the
coefficients estimated by the two models. The reason is that, although the standard
logistic (the basis of logit) and the standard normal distributions (the basis of probit)
both have a mean value of zero and their variances are different;1 for the standard

normal and  a3for the logistic distribution , where  ≈ 22a7. Therefore, if the probit

coefficient is multiplied by about 1.81(which is approximately = a , the logit


√3
coefficient will be got approximately.
Incidentally, Amemiya (1981) has also shown that the coefficients of LPM and
logit models are related as follows:
17

βLPM = 0.25 βLogit except for intercept


and
βLPM = 0.25 βLogit + 0.5 for intercept
Amemiya also suggested multiplying a logit estimate by 0.625 to get a better
estimate of the corresponding porbit estimate. Conversely, multiplying a probit
coefficient by 1.6 (=1/0.625) gives the corresponding logit coefficient.
18

CHAPTER III

DIAGNOSTIC STATISTICS FOR QUALITATVE RESPONSE MODELS

Some diagnostic statistics for qualitative response models namely, t-test (Z-
test), the predictive quality (classification table and hit rate), and analysis of the
residuals (in particular an LM test for heteroscedasticity), the likelihood ratio test and
goodness-of-fit (R2) will be presented in this Chapter.

3.1 Z statistic
The significance of individual explanatory variables can be tested by the usual
t-test. The sample size should be sufficiently large to rely on the asymptotic
expressions for the standard errors, and the t-test statistic then follows approximately
the standard normal distribution. Since the method of maximum likelihood is generally
a large sample method, the estimated standard errors are asymptotic. As a result,
instead of using the t statistic to evaluate the statistical significance of a coefficient,
(standard normal) Z statistic has to be used.

3.2 Likelihood Ratio (LR) Statistic


To test the null hypothesis that all the slope coefficients are simultaneously
equal to zero, the equivalent of the F test in the linear regression model is the
likelihood ratio (LR) statistic. Under the null hypothesis, H0: β2 = β3=…= βk= 0; the
LR statistic follows the X2 distribution with degree of freedom equal to the number of
explanatory variables. That is,
2 Ln(L1 – L0) ~ X2(k-1)

where L0 is the likelihood function when all parameters except the intercept, are set to
zero and L1 is likelihood function of the model of interest. Sometimes this measures
similar to the R2 of linear regression models. Joint parameter restrictions can be tested
by the likelihood ratio test.

3.3 R2 Statistic
A goodness-of-fit measure is a summary statistic indicating the accuracy with
which the model approximates the observed data, like the R2 measure in the linear
19

regression model. In linear regression model, R2 is the most commonly used measure
for assessing the discriminatory power of the model. R2 possesses three properties.
First, it is standardized to fall in the range (0, 1), equaling 0 when the model affords no
predicted efficacy over the marginal mean and equaling 1 when the model perfectly
accounts for, or discriminates among the responses. Second, it is non decreasing in X,
meaning that it cannot decrease as regressors are added to the model. Third, it can be
interpreted as the proportion of variation in the response accounted for by the
regression.
In the case in which the dependent variable is qualitative, accuracy can be
judged either in terms of the fit between the calculated probabilities and observed
response frequencies or in terms of the model's ability to forecast observed responses.
Contrary to the linear regression model, there is not single measure for the goodness-
of-fit in qualitative response models and a variety of measures exists in nonlinear
models.
Often, goodness-of-fit measures are implicitly or explicitly based on
comparison with a model that contains only a constant as explanatory variable. A first
goodness-of-fit measure defined by Amemiya (1981) is known as Pseudo-R2 which is
formulated by

pseudo-R2 = 1-
% (jklmR ,jklmn ) /o

where N denotes the number of observations.


An alternative measure suggested by McFadden (1974) is
LogL
McFadden R2 = 1- aLogL
8

which is sometimes referred to as the likelihood ratio index. Like R2, R2MCF
also ranges between 0 and 1.
Another comparatively simple measure of goodness of fit is the count R2,
which is defined as:
;p.pq rpss*rt us*`rtp;v
Count R2 = wptcx ;p.pq pGv*syctp;v

Since the regressand in the model takes a value of 1 or zero, the number of
correct predictions can be counted. If the predicted probability is greater than 0.5, it is
classified as 1, but if it is less than 0.5, it is classified as 0.
20

3.4 Predictive Quality


Alternative specifications of the model may be compared by evaluating
whether the model gives a good classification of the data into the two categories yi = 1
and yi=0. The estimated model gives predicted probabilities Si for the choice yi = 1,
and this can be transformed into predicted choices by predicting that Si = 1 if Si ≥ c
and SI = 0 if, Si < c. The choice of c can sometimes be based on the costs of
misclassification. In practice on often takes c = 1a2 , or, if the fraction SI of successes
differs much from 50 per cent, one takes c = Si . This leads to a 2x2 classification table
of the predicted responses Si against the actually observed responses yi. The hit rate is
defined as the fraction of correct predictions in the sample. Formally, let wi be the
random variable indicating a correct prediction – that is, wi= 1 if Yi = SI and wi =0 if

Yi ≠ SI, then the hit rate is defined by h=; ∑;< | .

In the population the fraction of successes is. If the prediction 1 with


probability  and 0 with probability (1- ) were randomly made, then a correct
prediction is with probability q= 2 + (1- )2. Using the properties of the binomial
distribution for the number of correct random predictions, it follows that the 'random'
hit rate hr has expected value E (hr) = E(w) = q and variance Var (hr) = Var (w) /n
=q(1-q)/n. The predictive quality of the model can be evaluated by comparing hit rate
h with the random hit rate hr. Under the null hypothesis that the predictions of the
model are no better than pure random predictions, the hit rate h is approximately
normally distributed with mean q and variance q(1-q)/n. Therefore, reject the null
hypothesis of random predictions in favor of the (one-sided) alternative of better- than
random predictions if

},~ ;},;~
z= =
T~(,~)/; T;~(,~)

is large enough (larger than 1.64 at 5 per cent significance level). In practice, q= 2 +
(1- )2 is unknown and estimated by F2 + (1- S)2, where is the faction of successes
in the sample. In the above expression for the z-test, nh is the total number of correct
predictions in the sample and nq is the expected number of correct random
predictions.
21

3.5 Analysis of Residuals


3.5.1 Standardized Residuals and Consequences of Heteroskedasticity
The residuals ui of a binary response model are defined as the differences
between the observed outcomes yi and the fitted probabilities Si. As the variance of yi
(for given values of xi) is  (1- ), the standardized residuals are defined by

LK
 ,J
∗ = (3.5.1)
L(,J
TJ K L)K

A histogram of the standardized residuals may be used, to detect outliers.


Further, scatter diagrams of these residuals against explanatory variables are useful to
investigate the possible presence of heteroskedasticity. Heteroskedasticity can be due
to different kinds of misspectfication of the model. It may be, for instance, that
relevant explanatory variable is missing or that the function F is misspecified. In
contrast with the linear regression model, where OLS remains consistent under
heteroskedasticity, maximum likelihood estimators of binary response models become
inconsistent under this kind of misspecification. For instance, if data generating
process is a probit model but one estimates a logit model, then the estimated
parameters and marginal effects are inconsistent and the calculated standard errors are
not correct. However, as the differences between the probit function and the logit
function are not so large, the outcomes may still be reasonably reliable.

3.5.2 Likelihood Ratio Test for Heteroskedasticity


A formal test for heteroskedasticity can be based on the index model ∗ =
∗ +ui. Until now it has been assumed that the error terms ui all follow the same
distribution (described by F). As an alternative can be considered the model where all
 /  follow the same distribution F where

&
 =  2 €
with zi a vector of observed variables. The constant term should not be included in
this vector because the scale parameter of a binary response model should be fixed,
22

independent of the data. Assume that the density function f (the derivative of F) is
symmetric – that is, f(t) = f(-t). It then follows that

P[yi = 1] = [∗ ≥0]

= P [ui ≥ - C ]
= P [(ui/ ) ≥ - C / ]
= P [(ui/ ) ≤ C / ]
= F (C / ],so that
&
P[yi = 1] = F(C / 2 € ) (3.5.2)
The null hypothesis of homoskedasticity corresponds to the parameter
restriction Ho :  =0. This hypothesis can be tested by the LR-test. The unrestricted
likelihood function is obtain from the log-likelihood by replacing the term
&
 = F (C ) by  = F (C  / 2 € ).

3.5.3 Lagrange Multiplier Test for Heteroskedasticity


Alternative is to use the LM-test, so that only the model under the null
hypothesis (with  =0) needs to be estimated. By working out the formulas for the
gradient and the Hessian of the unrestricted likelihood, it can be shown that the LM-
test can be performed as if Equation (3.5.2) were a non-linear regression model.
First estimate the model without heteroskedasticity – that is, under the null
hypothesis that  =0. This amounts to estimating the model P (yi = 1)=F (C ) by ML.
The residuals of this model are denoted by

ui = yi - F
= yi – F (C )
As a second up step, regress the residuals ui on the gradient of the non-linear
&
model P(yi = 1) = F (C  / 2 € ), taking into account that the residuals are
heteroskedastic. This amounts to applying (feasible) weighted least squares- that is,
OLS after division for the ith observation by the (estimated) standard deviation. The
variance of the 'error term' yi- is Var (yi -  ) = Var (yi) =  (1- ).  is replaced by
F obtained in the first step, so that the weight of the ith observation in WLS is given
23

&
by 1/TF (1 − F . Further, the gradient of the function F (C / 2 € ) in Equation (3.5.2)
, when evaluated at =0, is given by
& &
‚ƒ ( & "/„ ) †‡ ( & " /„ )
=f (C ) X, = - f (C ) C ..
†‡ †‡

Therefore, the required auxiliary regression in this second step can be written
in terms of the standardized residuals as

LK
 ,J q ( & G) q ( & G) & G
u∗ = = C ˆ 1 + .C ˆ 1 + ni. (3.5.3)
L(,J
TJ K L)K L(,J
TJ K L)
K L(,J
TJ K L)K

Under the null hypothesis of homoskedasticity, there holds that LM = nR2nc,


where nR2nc denotes the non-centered R2-that is , the explained sum of squares of
Equation (3.5.3) is divided by the non-centered total sum of squares ∑;<(∗ ) . As the
regression in Equation (3.5.3) does not contain a constant term on the right-hand side,
one should take here the non-centered R2 defined by R2nc = ∑(∗ ) /∑(∗ ) , where S*i
denotes the fitted values of the regression in Equation (3.5.3). Reject the null
hypothesis for large values of the LM-test, and under the null hypothesis of
homoskedasticity ( = 0) it is asymptotically distributed as X2 (g), where g is the
number of variables in Zi-that is , the number of parameters in .
24

CHAPTER IV

APPLICATION OF LINEAR PROBABILITY


LOGIT AND PROBIT MODELS

4.1 Introduction
In this chapter, the application of linear probability, logit and probit models
are demonstrated by survey data. The survey data used in this chapter are provided by
Ma Moe Sandar Oo who collected the data for her Master of public Administration
Thesis. The data were responses of the mother of 300 children under 3 years of age in
Thingungyun Township. The weights of the children were assessed from the standard
weight chart using by Township Health Center. There are four different colours (red,
yellow, green, white) to present the condition of child's weight on this chart. Red
colour represents the child's weight, which reflects the severe malnutrition. Yellow
colour stands for moderate malnutrition of child's condition and green colour signifies
as good condition. White colour zone shows another form of malnutrition which is
known as over-eight child. In general, malnutrition can be defined as underweight in
developing countries, which is a serious public health problem that has been linked to
a substantial increase in the risk of morbidity and mortality. The term malnutrition
refers to both over-nutrition and under-nutrition. Malnutrition is a general term for a
medical condition caused by an improper or inadequate diet and nutrition. In This
study, if child's weight colour is green, the child can be determined by nutrition, and if
child's weight colour is yellow (or) red, the child can be determined by malnutrition.
The white colour case is very rare in Myanmar. So, white colour case is omitted from
this study.
Out of these collected information, mother's age, mother's education level and
child's weight colour variable are used to develop the models. Mother's education
levels are divided into 4 categories such as primary, middle, high, and graduate.
Child's weight colour is divided into 3 categories such as green, yellow, and red. To
estimate the models, mother's age and mother's education level are used as
independent variables and child's weight colour is used as dependent variable.
25

4.2 Models for Child's Weight Colour


In construction the models, the variables are noted as:

Yi =1 if child's weight colour is green


=0 otherwise
MAGEi = mother's age
MEDUi =1 if mother's education is primary school level
=0 otherwise
MEDU2 =1 if mother's education is middle school level
=0 otherwise
MEDU3 =1 if mother's education is high school level
=0 otherwise

The Linear Probability Model (LPM)

Yi = β1 + β2 MAGEi + β3 MEDU1 + β4 MEDU2+ β5 MEDU3 + ui

where ui is disturbance term and the unknown parameters β1, β2, β3, β4 and β5 in the
LPM are estimated by using the weighted least squares method using Statistical
Package for Social Science (SPSS). It is assumed that the variance of ui is proportional
to the variable MAGEi.

The Logit Model


The ligit model here can be written as;
J
Li = ln = β1 + β2 MAGEi + β3 MEDU1 + β4 MEDU2+ β5 MEDU3 + ui
,J

where  = the probability that child's weights colour is green


1-  = the probability that child's weight colour is not green

The Probit Model


Assume that Ii = unobservable untility index (latent variable)
I∗ = critical or threshold level of the index
If Ii exceeds I∗ , the child's weight colour will be green , otherwise it will not.
26

Ii = β1 + β2 MAGEi + β3 MEDU1 + β4 MEDU2+ β5 MEDU3


Ii = F-1 (Ii)
= F-1 ( )
= β1 + β2 MAGEi + β3 MEDU1 + β4 MEDU2+ β5 MEDU3
where F-1 is the inverse of the normal cumulative distribution function (CDF).

 = Pr (Y = 1/X)
=Pr ( I∗ ≤ Ii)
= F (β1 + β2 MAGEi + β3 MEDU1 + β4 MEDU2+ β5 MEDU3)

 represents the probability that child's weight colour is green , it is


measured by the area of the standard normal curve from = 7to Ii.
The unknown parameters in the logit and probit models are estimated by using
method of Maximum Likelihood and Enter Regression Method through computer
software of (SPSS).

4.3 Results
The estimated models and their results are described in this section. The
estimated standard error (se) and computed p-values are shown in parentheses.

Linear Probability Model


Y = 1.079 - 0.009 MAGEi – 0.052 MEDUi – 0.009 MEDU2 + 0.139 MEDU3
se (0.044) (0.003) (0.094) (0.065) (0.083)
P. values (0.000) (0.001) (0.580) (0.131) (0.096)
R2 = 0.146, ‰ 2 = 0.134, count R2 = 0.76, Pseudo R2 = 0.15,
McFadden R2 = 0.157, F = 12.598
According to the p-values it van be said that the variable MEDUi and MEDU2
are insignificant and the variables MAGEi and MEDU3 are significant at % level, and
10% level, respectively. The insignificant variables MEDU1 and MEDU2 are dropped
from the model and estimate the model for child's weight colour with the variable
MAGEi and MEDU3.
27

The re-estimated model is as follows:


Y = 1.082 – 0.012 MAGEi + 0.217 MEDU3
se (0.044) (0.022) (0.063)
p.values (0.000) (0.000) (0.000)
R2 = 0.105, ‰ 2 = 0.102, count R2 = 0.76, Pseudo R2 = 0.141,
McFadden R2 = 0.15, F = 34.87

The results imply that the variable MAGEi and MEDU3 are important factors in
explaining the changes of probability of child's nutrition . It can be said that if the
mother's age increases by 1-year and being mother's education in high school level
remained unchanged, the probability of child's nutrition will decrees by about 1.2% IF
the mother's education is in high school level and being mother's age remained
unchanged, the probability of child's nutrition will increase by 21.7%.

Logit Model
F
J
L = In ,JF

= 11.146 - 0.248 MAGEi – 2.657 MEDUi – 3.029 MEDU2 + 0.268 MEDU3


se (1.768) (0.044) (1.134) (1.052) (1.271)
p.values (0.000) (0.000) (0.019) (0.004) (0.833)

count R2 = 0.79 , pseudo R2 = 0.102, McFaddenR2 = 0.211, x2 = 64.241

According to the p.values it can be said that each variable, except MEDU3 is
significant at 1% level and X2 = 64.241 indicates that the whole model is highly
significant . The insignificant variable MEDU3 is excluded from the model and
estimate the model for child's weight colour with the variables MAGEi , MEDUi, and
MEDU2. The re-estimated model is as follows;
F
J
L = ln ,JF

= 10.986 - 0.248 MAGEi – 2.488 MEDUi – 2.86 MEDU2


se (1.577) (0.044) (0.765) (0.693)
p.values (0.000) (0.000) (0.001) (0.000)

count R2 = 0.62 , pseudo R2 = 0.187, McFaddenR2 = 0.211, x 2 = 69.195


28

From the re-estimated ligit model, MAGEi, MEDU1 and MEDU2 are found to
be important factors in explaining the changes of the log of odds for child's nutrition. It
can be found that being other factors remained unchanged, with an increase of 1-year
of mother's age, there is an expectation of decrease in the log of odds for child's
nutrition about 0.25. Moreover, if the mother's education is in primary school level, it
is expected to have a decrease of 2.488 and if the mother's education is in middle
school level, it is expected to have a decrease of 2.86, in the log of odds for child's
nutrition, respectively.

Probit Model
Ii= -2.658 - 0.002 MAGEi + 0.103 MEDUi + 0.017 MEDU2 + 0.050 MEDU3
se (0.164) (0.005) (0.098) (0.079) (0.056)
p.values (0.000) (0.763) (0.291) (0.826) (0.446)

count R2 = 0.76 , x 2 = 101.624

According to the p. values it can be said that the all variables are insignificant
at 1% and 10% level.
In summarizing the results and findings of estimated models, the diagnostic
statistics such as p-values, computed F-values and computed X2 values indicate that
the LPM and logit model are found to be significant models.
From the estimated LPM and logit models the variable mother's age and
mother's education are important factors in explaining the changes of child's nutrition.
For the estimated models, the count R2 value is high, whereas the McFadden R2
vale and pseudo R2 are low. Although these R2 values are not directly comparable,
they can give some idea about the orders of magnitude. Besides, one should not
overplay the importance of goodness of fit in models where the regressand is
dichotomous. The estimated R2 may seem rather low, but in view of the large sample
size, this R2 is still significant on the basis of the F test.
29

CHAPTER V

CONCLUSION

In this paper, qualitative response models: linear probability, logit, and probit
models in which the dependent variable involves only two qualitative choices are
studied together with their specification and estimation procedure. These models are
valuable in the analysis of survey data. The important characteristics of this study are
as follows:
1. Qualitative response regression models refer to models in which the response,
or regressand, variable is not quantitative or an interval scale.
2. The simplest possible qualitative response regression model is the binary
model in which the regressand is of the yes/no or presence / absence type.
3. The simplest possible binary regression model is the linear probability model
(LPM) in which the binary response variable is regressed on the relevant
explanatory variables by using the standard OLS methodology. Simplicity may
not be a virtue here, fore the LPM suffers from several estimating problems.
Even if some of the estimation problems can be overcame, the fundamental
weakness of the LPM is that it assumes that the probability of something
happening increases linearly with the level of the regressor. This very
restrictive assumption can be avoided by using the logit and probit models.
4. In the logit model the dependent variable is the log of the odds ratio, which is a
linear function of the regressors. The probability function that underlies the
logit model is the logistic distribution. If the data are available in grouped form,
OLS can be used to estimate the parameters of the logit model, provided the
heteroscedastic nature of the error term is taken into account explicitly. If the
data are available at the individual, or micro level, nonlinear-in-the-parameter
estimating procedures, like as method of maximum likelihood can be used.
5. If the normal distribution is chosen as the appropriate probability distribution,
then the probit model can be used. This model is mathematically a bit difficult
as it involves integrals.
6. The estimated model can be interpreted in terms of the signs and significance
of the estimated coefficient. The model can be evaluated in different ways, by
30

using diagnostic tests (t or Z-test, LR-test) and by measuring the model quality
(goodness of fit R2).
As an application, these models are developed and estimated by using SPSS
computer software with the survey data of the mother of 300-children in Thingungyun
Township.
The findings are as follows:
(1) According to the computed F value and X2 value, the LPM and logit models
are significant but probit is not.
(2) IN the estimated LPM, it can be concluded that the variables mother's age and
mother's education are found to be important factor in explaining the child's
nutrition. From the estimated model, being other factors remained unchanged,
an increase in the mother's age of 1-year will decrease the probability of child's
nutrition by about 1.2%.
(3) In the estimated logit model, it can be said that the mother's age and mother's
education are found to be important factors in explaining the child's nutrition
From the estimated model, being other factors remained unchanged, an
increase in the mother's age of 1-year will decrease the odds for child's
nutrition by about 22%. If the mother's education is in primary school level, it
is expected to have an decrease about 92%, if the mother's education is in
middle school level, it is expected to have a decrease about 94% in the odds for
child's nutrition, respectively.
31

REFERENCES

1. ALFRRED DEMARIS, "Regression with Social Data: Modeling continuous


and Limited Response Variables", Published by John Willey & Sons Inc,
Hoboken, New Jersey, U.S.A.

2. Christiaan H, P. de Boer, P.H.Franses, T.Kloek, and H.K. van Dijk (2004)


"Econometric Methods with Applications in Business and Economics", First
Edition, Oxford University Press.

3. G.K. David and Mitchel K. (2002) " Logistic Regression".A Self-Learning


Text Second Edition, Springer-Verlag, New York, Inc.

4. Gujarati, D.N. and Sangeetha (2008) " Basic Econometrics", Fourth Edition,
McGraw-Hill Publishing Company Ltd.

5. MarnoVerbeek (2008),"A guide To Modern Econometrics", Third Edition, John


Wiley & sons, Ltd.

6. Moe Sandar Oo (July 2009), "A study on the Influential Factors of


Underweight children (age under 3) in Thingungyun Township", MPA Thesis.

7. Neter J., Michael H.K, Christopher J.N, and William Wasserman, (1996),
" Applied Linear Statistical Models", 4th Edition, McGraw-Hill.

8. Takeshi Amemiya (1918) "Qualitative Response Models" A Survey, JASA.


Vol. XIX.

9. William. H. Green (2000), "Econometric analysis" 4th edition.


Copyright 2000 by Prentice- Hall,Inc, upper Saddle river, New Jersey 07458
Printed in the United States of America.

You might also like