Dummy Dependent Variable
Dummy Dependent Variable
Dummy Dependent Variable
Response
Regression Models
3
Regression with Qualitative
Dependent Variable
Qualitative
Dependent
Probit Discriminatory
Regression Analysis
4
Meaning of response function
when dependent variable is
binary
5
Consider SLR model Y=o+ 1Xi +Ui
E(Yi)=o+ 1Xi As E(Ui)=0
In case of 0,1 value, Yi is a Bernoulli random variable with probability distribution
Yi Probability
1 Pi
0 1- Pi
By the definition of expected value of a random variable
E(Yi)=1(Pi)+0(1- Pi)= Pi
The mean response is interpreted as the probability of
success i.e Yi=1, when the regressor variable takes on the
values Xi.
Linear regression model with a dependent variable that is
either 0 or 1 is called the Linear Probability Model, or LPM.
The LPM predicts the probability of an event occurring, and,
like other linear models, says that the effects of Xs on the
probabilities are linear. 6
Meaning of Expected value of
dependent binary variable
When the dependent variable is a
Bernoulli variable (i.e. the value is either 0
or 1), then the linear regression model
actually predict the probability for 1 to
happen. Therefore, we also call this model
a linear probability model.
7
Difference Between Quantitative And
Qualitative Response Regression Models
9
Problems with LPM
10
Non-normality of disturbance term
For a binary dependent variable, Like Y,
U also takes on two values as
Ui= Yi-o- 1Xi
When Yi =1 Ui= 1-o- 1Xi
Yi =0 Ui= 0-o- 1Xi
Obviously U cannot be assumed to be
normally distributed; actually it fallows the
binomial distribution
11
Non-constant variances
U Probability
The probability distribution of U is
1-o- 1Xi Pi
-o- 1Xi 1- Pi
12
Constraints on the response
function
Since the response function represents
probabilities when the dependent variable is
a binary variable and hence 0<E(Yi)<1
A linear response function may fall outside the
constraint limits within the range of the
independent variable in the scope of the
model.
13
Non-linearity
Even if predicted probabilities did not take on impossible
values, the assumption that there will be a straight linear
relationship between the IV and the DV is also very
questionable when the DV is a dichotomy.
For example, suppose that the DV is home ownership, and one
of the IVs is income. According to the LPM, an increase in
wealth of $50,000 will have the same effect on ownership
regardless of whether the family starts with 0 wealth or wealth
of $1 million. Certainly, a family with $50,000 is more likely to
own a home than one with $0. But, a millionaire is very likely to
own a home, and the addition of $50,000 is not going to
increase the likelihood of home ownership very much.
14
Structural defects of LPM
The LPM predicts < 0 and > 1 for sufficiently large or
small x values
15
Solution for the problems
16
Constant Variance
Since the variance of U depends on the expected value of Y i.e variance are
heteroscadastic. One way of handling the problem of non-constant
variances of disturbances is through the use of weighted least squares.
That is, through appropriate weighting schemes, errors can be made
homoskedastic
To apply weighted least square transform the data by dividing both sides of the model by
E (Yi )(1 E (Yi )) Wi
Yi 1 X Ui
The transformed model is O 1
Wi Wi Wi W
17
Problem with Weighted least square
But Wis are unknown as it involve EYi which contains
Unknown parameters. So estimate Wi by using two
stage least square procedure
Stage-I: Fit the regression model by OLS and
estimate the parameters.
Stage-II:-Estimate the Weights by using estimated
parameters in stage-I and then transformed the
original model and then estimate the transformed
model by OLS
NOTE;- the application of OLS on transformed model
is called Weighted least square (WLS).
18
Solution for constraints on the
response function
There are two ways of handling the problem about the
constraints on the response function.
1. Estimate the model by usual OLS and find out whether
the Yi^ lie between 0 and 1. If some are less then 0, Y^
is assumed to be zero for those cases, if they are
greater then 1, they are assumed to be 1.
2. Use an estimating technique that will guarantee that the
estimated Y will lie between 0 and 1.
20
Logistic Regression
Both theoretical and empirical consideration suggests that when the
dependent variable is an indicator variable the shape of the response
function will frequently curvilinear. The following figure contains a
curvilinear response function which has been found appropriate in many
instances involving a binary dependent variable
21
Note that this response function is shaped like a tilted S and that it has asymptotes
at 0 and 1. The latter features assumes that the constraints on E(Y) i.e 0<E(Y) <1
are automatically met. The response function in the above figure is called the
logistic function and is given by
23
Advantage of log odds (Logit)
Although probabilities can range from 0 to 1, log odds can range from -
to +.
Although Logits are linear in X , the probabilities themselves are not, in
contrasts to the LPM where probabilities increase linearly with X.
Logit is a linear function of unknown parameters so we can apply
standard theory of OLS to estimate parameters in the logit model in usual
way.
Log odds follow an S-shaped curve. At the extremes, changes in the log
odds produce very little change in the probabilities. In the middle of the S
curve, changes in the log odds produce much larger changes in the
probabilities.
To put it another way, linear, additive increases in the log odds produce
nonlinear changes in the probabilities.
24
Comparison of LPM and
Logistic Regression
1 P
- 0 X +
25
Features of Logit Model
As goes from 0 to 1 (i.e., as varies from - to +), the logit L
goes from - to +. That is, although the probabilities (of
necessity) lie between 0 and 1, the logits are not so bound.
26
Estimation of the Logit model
If we have data at the individuakl level then we can not estimate the Logit model by
1 0
standard OLS metod As if Y=1 then ln and if Y=0 then ln are both meaningless. In
0 1
such situation we can estimate parameters by MLE
If replicated data is available i.e corresponding to each level of Xi, ni observation out of
Ri
which Ri (Ri<ni) belong to category in which Y=1 is available then we can compute Pi
ni
which is the relative frequency, we can use it as an estimate of the true Pi corresponding to each
Xi. If ni is fairly large Pi^ will be reasonably good estimate of Pi
It can be shown that if ni is fairly large and if each observation is distributed independently as a
1
binomial variable then U i N
niPi (1 Pi )
27
So the disturbance term in the logit model is heteroscedastic. Thus we have to use Weighted least
square instead of OLS with weights Wi niPi (1 Pi)
28
Example: Logistic regression
In a study of the effectiveness of coupons offering a price reduction on a
given product, 1,000 homes were selected and a coupon and advertising
material for the product were mailed to each. The coupons offered different
price reduction (5, 10, 15, 20, and 30 cents), and 200 homes were
assigned at random to each of the price reduction categories. The
independent variable X in this study is the amount of price reduction, and
the dependent variable Y is a binary variable indicating whether or not the
coupon was redeemed with in a six-month period. It was expected that the
logistic response function would be an appropriate description of the
relation between price reduction and probability that the coupon is utilized.
29
30
31
Direction of relationship:
The direction of relationship (positive or negative) reflects the changes in
the dependent variable associated with changes in the independent
variable. A positive relationship means that an increase in the independent
variable is associated with increase in the predicted probability and vice
versa for a negative relationship
e 20 b1 e 20 ( 0.1087 ) 8.8
The result indicates that estimated odds of using a coupon with price
reduction 30 is 8.8 times greater than the estimated odds of using a
coupon with price reduction 10 i.e estimated odds ratio for increase of
$20 in price reduction is 8.8. 33
Percentage Change in Odd Ratio
34
Percentage Change in Odd Ratio
35
36
Output from MINITAB
Predictor Coef SE Z P Ratio Lower Upper
Constant -2.18551 0.164667 -13.27 0.000
X 0.108719 0.0088429 12.29 0.000 1.11 1.10 1.13
X n P^
5 200 0.162205
10 200 0.250056
15 200 0.364770
20 200 0.497219
30 200 0.745749
37
Change in probability VS price reduction
0.0275
0.0250
Change in Probability
0.0225
0.0200
0.0175
0.0150
5 10 15 20 25 30
Price Reduction
38
Logit model for
ungrouped or individual data
Example:
Let Y=1 if a students final grade in an intermediate
microeconomics course was A and
Y=0 if the final grade was a B or C. (not A)
Pi
Li 1 2GPAi 3TUCEi 4 PSI i ui
1 Pi
Odds 95% CI
Predictor Coef SE Z P Ratio Lower Upper
Constant -13.0213 4.93100 -2.64 0.008
GPA 2.82611 1.26289 2.24 0.025 16.88 1.42 200.61
TUCE 0.0951577 0.141550 0.67 0.501 1.10 0.83 1.45
PSI=1 2.37869 1.06452 2.23 0.025 10.79 1.34 86.93
42
Interpretation of regression coefficient
Li 13.0213 2.82611GPA 0.09516TUCE 2.3787PSI
For quantitative regressors, slope coefficient is a partial slope
coefficient that measures the change in the estimated logit
for a unit change in the value of given quantitative regressor
keeping the effect of other regressors constant.
Coefficient of GPA
With other variables held constant one unit increase in GPA will
increase on the average the estimated logit by 2.83 times
In terms of odd ratio, keeping the effect of other variables
constant with one unit increase in GPA there is 17 e
2.8261
43
Interpretation of regression coefficient
Li 13.0213 2.82611GPA 0.09516 TUCE 2.3787 PSI
Coefficient of TUCE
In terms of odd ratio, keeping the effect of other variables constant
with one unit increase in TUCE there is 1.10 times
get
e 0.09516
A-grade.
l likely to
Coefficient of PSI
In terms of odd ratio, keeping the effect of other variables constant,
students who are exposed to the new
e 2.3787 method of teaching are
more than 10 times likely to get A-grade as compare to
those students who are not exposed to it
44
Probit Regression Models
45
Similarities and differences
between Logit and Probit models
Neither the logit model nor the probit model are linear,
which makes things difficult.
Interpretation
Since xb has a normal distribution so its
interpretation can be made as, the probit coefficient,
b, is that a one-unit increase in the predictor leads to
increasing the probit score by b standard deviations
Here xb is representing a multiplication of two
matrices, where x is a matrix of xs and b is a matrix
of s that is
x = [1,x1,x2,x3,,xn] and b = [1,2,3,,n]
49
The log-likelihood function for probit is
50
Example: If salary of people is given and we are interested to
find which of them is likely to own a house. Here owning a
house is binary variable (also dependent variable) as every
one will either buy a house or not as there is not third
situation can happen. So in this situation Probit regression
will be a better fit than LPM (linear probability model).
A data on Xi (INCOME), ni (Number of families at income Xi ),and Ri (Number
of families owning a house) . is given below
-----------------------------------------------------------------
Xi ni Ri
(thousand of dollars)
-----------------------------------------------------------------
6 40 8
8 50 12
10 60 18
13 80 28
15 100 45
20 70 36
25 65 39
30 50 33
35 40 30
40 25 20
-----------------------------------------------------------------
51
For Probit regression estimating the index , I, from the
standard normal, CDF as
----------------------------------------------------------------------------------------------
Xi ni Ri P^i Ii = F-1(P^i )
(thousand of dollars)
----------------------------------------------------------------------------------------------
6 40 8 0.20 -0.8416
8 50 12 0.24 -0.7063
10 60 18 0.30 0.5244
13 80 28 0.35 0.3853
15 100 45 0.45 0.1257
20 70 36 0.51 0.0251
25 65 39 0.60 0.2533
30 50 33 0.66 0.4125
35 40 30 0.75 0.6745
40 25 20 0.80 0.8416
---------------------------------------------------------------------------------------------
52
Results :
The results are as follows
Dependent Variable: I
---------------------------------------------------------------------------------------------
Variable coefficient std. error t-statistic probability
---------------------------------------------------------------------------------------------
C -1.0166 0.0572 -17.7473 1.0397E-07
Income 0.04846 0.00247 19.5585 4.8547E-08
---------------------------------------------------------------------------------------------
R2 = 0.97951 Durbin Watson statistics = 0.91384
53
Discussion :
The index Ii is expressed as
Ii = 1 + 2Xi
The model is
Pi = P(Y = 1/X) = P(Z Ii) = P(Z 1 + 2Xi) = F(1 +
2Xi)
54
Take X= 6 (thousand dollars)
So the normal density function is
f[-1.0166 + 0.04846(6)] = f(-0.72548)
Referring to normal distribution tables Z = -0.72548,
the normal density is about 0.3066
Now,
0.3066* 2 = 0.3066*0.04846 = 0.01485 = 1.4%
55
Practical Importance of Probit Regression
56
Drawback of Probit Regression:
57
Basic assumption of Probit Regression:
As logistic regression is based on the assumption
that the categorical dependent reflects an underlying
qualitative variable and uses the binomial
distribution, probit regression assumes the
categorical dependent reflects an underlying
quantitative variable and it uses the cumulative
normal distribution.
58