Dummy Dependent Variable

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 58

Qualitative

Response
Regression Models

Regression models when the


regressand, the response
variable is qualitative in nature
1
The Regression Analysis
The Regression Analysis deals with prediction of
the Mean value of the Dependent variable by
using information of Independent variables.
Nature of the dependent variable plays very
important role in the regression analysis.
Major types of the dependent variable
encountered in the regression analysis are
Quantitative and Qualitative types.
Estimation framework differ for both type of the
dependent variables.
2
Types of Qualitative
Dependent Variable
Following major types of Qualitative
Variables are met in practice:
Binary Variable
Categorical without Order (Nominal )
Categorical with Order (Ordinal)

3
Regression with Qualitative
Dependent Variable
Qualitative
Dependent

Binary Categorical Ordinal


Dependent Dependent Dependent

Quantitative Qualitative Quantitative Qualitative Quantitative Qualitative


Independent Independent Independent Independent Independent Independent

Logistic LogLinear Multinomial Log Linear Ordinal Log Linear


Regression Models Logistic Models Logistic Models

Probit Discriminatory
Regression Analysis

4
Meaning of response function
when dependent variable is
binary

5
Consider SLR model Y=o+ 1Xi +Ui
E(Yi)=o+ 1Xi As E(Ui)=0
In case of 0,1 value, Yi is a Bernoulli random variable with probability distribution

Yi Probability
1 Pi
0 1- Pi
By the definition of expected value of a random variable

E(Yi)=1(Pi)+0(1- Pi)= Pi
The mean response is interpreted as the probability of
success i.e Yi=1, when the regressor variable takes on the
values Xi.
Linear regression model with a dependent variable that is
either 0 or 1 is called the Linear Probability Model, or LPM.
The LPM predicts the probability of an event occurring, and,
like other linear models, says that the effects of Xs on the
probabilities are linear. 6
Meaning of Expected value of
dependent binary variable
When the dependent variable is a
Bernoulli variable (i.e. the value is either 0
or 1), then the linear regression model
actually predict the probability for 1 to
happen. Therefore, we also call this model
a linear probability model.

7
Difference Between Quantitative And
Qualitative Response Regression Models

In a model where the response, Y, is quantitative, our


objective is to estimate its expected, or mean, value
given the values of the regressors.
In models where the response, Y, is qualitative, our
objective is to find the probability of something
happening, such as voting for a Democratic
candidate, or owning a house, or belonging to a
union, or participating in a sport etc. Hence,
qualitative response regression models are often
known as probability models.
8
Approaches to developing a
probability model for a binary
response variable

The Linear Probability Model (LPM)


The Logit Model
The Probit Model

9
Problems with LPM

Non-normality of disturbance term


Non-constant variances
Constraints on the response function
Non-linearity

10
Non-normality of disturbance term
For a binary dependent variable, Like Y,
U also takes on two values as
Ui= Yi-o- 1Xi
When Yi =1 Ui= 1-o- 1Xi
Yi =0 Ui= 0-o- 1Xi
Obviously U cannot be assumed to be
normally distributed; actually it fallows the
binomial distribution
11
Non-constant variances
U Probability
The probability distribution of U is
1-o- 1Xi Pi
-o- 1Xi 1- Pi

So var(U) depends on X. Hence the variances of


disturbances will differ at different levels of X and OLS will
no longer be optimal (Estimators are although unbiased,
but they do not have minimum variance i.e not BLUE)

12
Constraints on the response
function
Since the response function represents
probabilities when the dependent variable is
a binary variable and hence 0<E(Yi)<1
A linear response function may fall outside the
constraint limits within the range of the
independent variable in the scope of the
model.

13
Non-linearity
Even if predicted probabilities did not take on impossible
values, the assumption that there will be a straight linear
relationship between the IV and the DV is also very
questionable when the DV is a dichotomy.
For example, suppose that the DV is home ownership, and one
of the IVs is income. According to the LPM, an increase in
wealth of $50,000 will have the same effect on ownership
regardless of whether the family starts with 0 wealth or wealth
of $1 million. Certainly, a family with $50,000 is more likely to
own a home than one with $0. But, a millionaire is very likely to
own a home, and the addition of $50,000 is not going to
increase the likelihood of home ownership very much.
14
Structural defects of LPM
The LPM predicts < 0 and > 1 for sufficiently large or
small x values

The LPM assumes that (X) increases linearly with X,


that is, the marginal or incremental effect of X remains
constant throughout

The variability is not constant, but rather depends on X


through its influence on . Thus the ordinary least
squares estimators are not BLUE.

Y, being binary, is very far from normally distributed

15
Solution for the problems

Increase the sample size, so that


Binomial distribution tends to Normal
distribution

16
Constant Variance
Since the variance of U depends on the expected value of Y i.e variance are
heteroscadastic. One way of handling the problem of non-constant
variances of disturbances is through the use of weighted least squares.
That is, through appropriate weighting schemes, errors can be made
homoskedastic
To apply weighted least square transform the data by dividing both sides of the model by
E (Yi )(1 E (Yi )) Wi

Yi 1 X Ui
The transformed model is O 1
Wi Wi Wi W

17
Problem with Weighted least square
But Wis are unknown as it involve EYi which contains
Unknown parameters. So estimate Wi by using two
stage least square procedure
Stage-I: Fit the regression model by OLS and
estimate the parameters.
Stage-II:-Estimate the Weights by using estimated
parameters in stage-I and then transformed the
original model and then estimate the transformed
model by OLS
NOTE;- the application of OLS on transformed model
is called Weighted least square (WLS).
18
Solution for constraints on the
response function
There are two ways of handling the problem about the
constraints on the response function.
1. Estimate the model by usual OLS and find out whether
the Yi^ lie between 0 and 1. If some are less then 0, Y^
is assumed to be zero for those cases, if they are
greater then 1, they are assumed to be 1.
2. Use an estimating technique that will guarantee that the
estimated Y will lie between 0 and 1.

The LOGIT and PROBIT models will guarantee that the


estimated probabilities will indeed lie between the
logical limit 0 and 1.
19
Solution for constraints on
the response function

We need a model that has two features:


AS Xi increases, Pi=E(Y=1IX) also increases but never
steps outside the 0-1 interval
The relationship between Pi and Xi is nonlinear, i.e. one
approaches zero at slower and slower rates as xi gets
small and approaches one at slower and slower rates as
Xi gets very large.

The LOGIT and PROBIT models will guarantee that the


estimated probabilities will indeed lie between the logical
limit 0 and 1.

20
Logistic Regression
Both theoretical and empirical consideration suggests that when the
dependent variable is an indicator variable the shape of the response
function will frequently curvilinear. The following figure contains a
curvilinear response function which has been found appropriate in many
instances involving a binary dependent variable

21
Note that this response function is shaped like a tilted S and that it has asymptotes
at 0 and 1. The latter features assumes that the constraints on E(Y) i.e 0<E(Y) <1
are automatically met. The response function in the above figure is called the
logistic function and is given by

e o 1 X cumulative logistic distribution function


Pi E (Yi )
1 e o 1 X
Here Pi is not only non-linear in X but in B,s as well , so we cannot use the usual OLS
method to estimate the parameters which require parameters in linear form. The non-
linear of parameters in logistic function can be solved easily because logistic function is
intrinsically linear the linearization of the logistic function is as follows
22
e o 1X 1
1 Pi 1 o 1X o 1X
1 e e
Pi
e o 1X odd - Ratio
1 Pi
Pi
Li ln o 1X log of Odd - Ratio also called logit
1 Pi

Interpretation of slope coefficient


One unit increase in X will result in a 1 increase in the log odds or in terms of
odd ratio one unit increase in X will result in exp(1) increase in Odds.

23
Advantage of log odds (Logit)
Although probabilities can range from 0 to 1, log odds can range from -
to +.
Although Logits are linear in X , the probabilities themselves are not, in
contrasts to the LPM where probabilities increase linearly with X.
Logit is a linear function of unknown parameters so we can apply
standard theory of OLS to estimate parameters in the logit model in usual
way.
Log odds follow an S-shaped curve. At the extremes, changes in the log
odds produce very little change in the probabilities. In the middle of the S
curve, changes in the log odds produce much larger changes in the
probabilities.
To put it another way, linear, additive increases in the log odds produce
nonlinear changes in the probabilities.

NOTE:- We can add as many regressors as may be dictated by the


underlying theory.

24
Comparison of LPM and
Logistic Regression
1 P

- 0 X +

25
Features of Logit Model
As goes from 0 to 1 (i.e., as varies from - to +), the logit L
goes from - to +. That is, although the probabilities (of
necessity) lie between 0 and 1, the logits are not so bound.

Whereas the LPM assumes that (X) is linearly related to X, the


logit model assumes that the log odds is linearly related to X.

Although L is linear in X, the probabilities themselves are not. This


property is in contrast with the LPM where the probabilities
increase linearly with X.

If L, the logit, is positive, it means that when the value of the


regressor(s) increases, the odds that the regressand equals 1
(meaning some event of interest happens) increases. If L, the
logit, is negative, the odds that the regressand equals 1
decreases as the value of X increases.

26
Estimation of the Logit model

Data at the individual level :

If we have data at the individuakl level then we can not estimate the Logit model by
1 0
standard OLS metod As if Y=1 then ln and if Y=0 then ln are both meaningless. In
0 1
such situation we can estimate parameters by MLE

Grouped or Replicated Data:

If replicated data is available i.e corresponding to each level of Xi, ni observation out of
Ri
which Ri (Ri<ni) belong to category in which Y=1 is available then we can compute Pi
ni
which is the relative frequency, we can use it as an estimate of the true Pi corresponding to each
Xi. If ni is fairly large Pi^ will be reasonably good estimate of Pi

It can be shown that if ni is fairly large and if each observation is distributed independently as a
1
binomial variable then U i N
niPi (1 Pi )
27
So the disturbance term in the logit model is heteroscedastic. Thus we have to use Weighted least
square instead of OLS with weights Wi niPi (1 Pi)

Weighted Least Square:

Transformed the original model as Wi Li o Wi 1 Wi Xi WiUi

Apply Least square to the transformed model ( Called WLS)

As the transformed disturbance term is homoscedastic so application of Least square method to


the transformed model will yield BLUE estimators.

28
Example: Logistic regression
In a study of the effectiveness of coupons offering a price reduction on a
given product, 1,000 homes were selected and a coupon and advertising
material for the product were mailed to each. The coupons offered different
price reduction (5, 10, 15, 20, and 30 cents), and 200 homes were
assigned at random to each of the price reduction categories. The
independent variable X in this study is the amount of price reduction, and
the dependent variable Y is a binary variable indicating whether or not the
coupon was redeemed with in a six-month period. It was expected that the
logistic response function would be an appropriate description of the
relation between price reduction and probability that the coupon is utilized.

29
30
31
Direction of relationship:
The direction of relationship (positive or negative) reflects the changes in
the dependent variable associated with changes in the independent
variable. A positive relationship means that an increase in the independent
variable is associated with increase in the predicted probability and vice
versa for a negative relationship

Interpreting the Direction of ORIGINAL COEFFICIENTS:


The sign of the original coefficient (+ve or ve) indicates the direction of the relationship.
A positive coefficient increases the probability , where as a negative value decreases the
predicted probability In above example sign of b1 (0.1087) is positive which indicates
that increase in price reduction will increase its utility

Interpreting the Direction of EXPONENTIATED COEFFICIENTS:


An exponentiated coefficient e b1 above 1 reflect positive
relationship and value less than 1 represent negative relationship. In
e e
b1
above example 0.1087
1.11 is greater than 1 so increase
in price reduction will increase its probability of being use. 32
Interpreting the magnitude of relationship
The odds ratio for an independent variable represents the change in the odds for
a one unit change in the independent variable holding all the other independent
variables constant.
Estimated odd-ratio for X i.e Exponentiated coefficient (1.11) indicates that with
the increase of 1 $ in price reduction chance of using the coupons is 1.11 times
higher than not use.

Interpreting the magnitude of relationship


Suppose we want to compare the odds of using a coupon of price reduction 10
to the odds of using a coupon of price reduction 30.

e 20 b1 e 20 ( 0.1087 ) 8.8
The result indicates that estimated odds of using a coupon with price
reduction 30 is 8.8 times greater than the estimated odds of using a
coupon with price reduction 10 i.e estimated odds ratio for increase of
$20 in price reduction is 8.8. 33
Percentage Change in Odd Ratio

34
Percentage Change in Odd Ratio

35
36
Output from MINITAB
Predictor Coef SE Z P Ratio Lower Upper
Constant -2.18551 0.164667 -13.27 0.000
X 0.108719 0.0088429 12.29 0.000 1.11 1.10 1.13

X n P^
5 200 0.162205
10 200 0.250056
15 200 0.364770
20 200 0.497219
30 200 0.745749
37
Change in probability VS price reduction
0.0275

0.0250
Change in Probability

0.0225

0.0200

0.0175

0.0150

5 10 15 20 25 30
Price Reduction
38
Logit model for
ungrouped or individual data
Example:
Let Y=1 if a students final grade in an intermediate
microeconomics course was A and
Y=0 if the final grade was a B or C. (not A)

Spector and Mazzeo used grade point average (GPA),


TUCE, and Personalized system of instruction (PSI) as the
grade predictors. Fit an Logit model to this data set.
Where
TUCE= score on an examination given at the beginning of
the term to test entering knowledge of macroeconomics
PSI=1 if the new teaching method is used
=0 otherwise
39
GPA TUCE PSI Grade Letter GPA TUCE PSI Grade Letter
grade grade grade grade grade grade
1 2.66 20 0 0 C 17 2.75 25 0 0 C
2 2.89 22 0 0 B 18 2.83 19 0 0 C
3 3.28 24 0 0 B 19 3.12 23 1 0 B
4 2.92 12 0 0 B 20 3.16 25 1 1 A
5 4.00 21 0 1 A 21 2.06 22 1 0 C
6 2.86 17 0 0 B 22 3.62 28 1 1 A
7 2.76 17 0 0 B 23 2.89 14 1 0 C
8 2.87 21 0 0 B 24 3.51 26 1 0 B
9 3.03 25 0 0 C 25 3.54 24 1 1 A
10 3.92 29 0 1 A 26 2.83 27 1 1 A
11 2.63 20 0 0 C 27 3.39 17 1 1 A
12 3.32 23 0 0 B 28 2.67 24 1 0 B
13 3.57 23 0 0 B 29 3.65 21 1 1 A
14 3.26 25 0 1 A 30 4.00 23 1 1 A
15 3.53 26 0 0 B 31 3.10 21 1 0 C
16 2.74 19 0 0 B 32 2.39 19 1 1 A 40
41
The logit model here can be written as

Pi
Li 1 2GPAi 3TUCEi 4 PSI i ui
1 Pi

By using MINITAB software

Odds 95% CI
Predictor Coef SE Z P Ratio Lower Upper
Constant -13.0213 4.93100 -2.64 0.008
GPA 2.82611 1.26289 2.24 0.025 16.88 1.42 200.61
TUCE 0.0951577 0.141550 0.67 0.501 1.10 0.83 1.45
PSI=1 2.37869 1.06452 2.23 0.025 10.79 1.34 86.93

Li 13.0213 2.82611GPA 0.09516TUCE 2.3787 PSI

42
Interpretation of regression coefficient
Li 13.0213 2.82611GPA 0.09516TUCE 2.3787PSI
For quantitative regressors, slope coefficient is a partial slope
coefficient that measures the change in the estimated logit
for a unit change in the value of given quantitative regressor
keeping the effect of other regressors constant.
Coefficient of GPA
With other variables held constant one unit increase in GPA will
increase on the average the estimated logit by 2.83 times
In terms of odd ratio, keeping the effect of other variables
constant with one unit increase in GPA there is 17 e
2.8261

times likely to get A-grade

43
Interpretation of regression coefficient
Li 13.0213 2.82611GPA 0.09516 TUCE 2.3787 PSI
Coefficient of TUCE
In terms of odd ratio, keeping the effect of other variables constant
with one unit increase in TUCE there is 1.10 times
get
e 0.09516
A-grade.
l likely to

Coefficient of PSI
In terms of odd ratio, keeping the effect of other variables constant,
students who are exposed to the new
e 2.3787 method of teaching are
more than 10 times likely to get A-grade as compare to
those students who are not exposed to it

44
Probit Regression Models

Probit is a variant of logit modeling based on


different data assumptions.

The term "probit' was coined in the 1930's by


Chester Bliss and stands for probability unit.
These two analyses, logit and probit, are very
similar to one another.

45
Similarities and differences
between Logit and Probit models
Neither the logit model nor the probit model are linear,
which makes things difficult.

To make the model linear, a transformation is done on


the dependent variable.

Both methods use maximum likelihood, and so require


more cases than a similar OLS model. Unlike logit
models, we don't get odds ratios with probit models.

Probit regression is an alternative log-linear approach to


handling categorical dependent variables.
46
47
48
The probit model is defined as
Pr(y=1|x) = (xb)
where is the standard cumulative normal
probability distribution and xb is called the probit
score or index.

Interpretation
Since xb has a normal distribution so its
interpretation can be made as, the probit coefficient,
b, is that a one-unit increase in the predictor leads to
increasing the probit score by b standard deviations
Here xb is representing a multiplication of two
matrices, where x is a matrix of xs and b is a matrix
of s that is
x = [1,x1,x2,x3,,xn] and b = [1,2,3,,n]
49
The log-likelihood function for probit is

where wj denotes optional weights

Logit models are more popular than probit


models.

50
Example: If salary of people is given and we are interested to
find which of them is likely to own a house. Here owning a
house is binary variable (also dependent variable) as every
one will either buy a house or not as there is not third
situation can happen. So in this situation Probit regression
will be a better fit than LPM (linear probability model).
A data on Xi (INCOME), ni (Number of families at income Xi ),and Ri (Number
of families owning a house) . is given below
-----------------------------------------------------------------
Xi ni Ri
(thousand of dollars)
-----------------------------------------------------------------
6 40 8
8 50 12
10 60 18
13 80 28
15 100 45
20 70 36
25 65 39
30 50 33
35 40 30
40 25 20
-----------------------------------------------------------------
51
For Probit regression estimating the index , I, from the
standard normal, CDF as

----------------------------------------------------------------------------------------------
Xi ni Ri P^i Ii = F-1(P^i )

(thousand of dollars)
----------------------------------------------------------------------------------------------

6 40 8 0.20 -0.8416
8 50 12 0.24 -0.7063
10 60 18 0.30 0.5244
13 80 28 0.35 0.3853
15 100 45 0.45 0.1257
20 70 36 0.51 0.0251
25 65 39 0.60 0.2533
30 50 33 0.66 0.4125
35 40 30 0.75 0.6745
40 25 20 0.80 0.8416
---------------------------------------------------------------------------------------------

52
Results :
The results are as follows

Dependent Variable: I
---------------------------------------------------------------------------------------------
Variable coefficient std. error t-statistic probability
---------------------------------------------------------------------------------------------
C -1.0166 0.0572 -17.7473 1.0397E-07
Income 0.04846 0.00247 19.5585 4.8547E-08
---------------------------------------------------------------------------------------------
R2 = 0.97951 Durbin Watson statistics = 0.91384

53
Discussion :
The index Ii is expressed as
Ii = 1 + 2Xi

The model is
Pi = P(Y = 1/X) = P(Z Ii) = P(Z 1 + 2Xi) = F(1 +
2Xi)

We want to find out the effect of a unit change in X (income


measured in thousands of dollars) on the probability that Y
= 1, that is, a family purchases a house. So here we will
take the derivative of the function with respect to X (that is,
rate of change of probability with respect to income)
dPi/dxi = f(1 + 2Xi)2
where f(1 + 2Xi) is a standard normal probability density
function evaluated at 1 + 2Xi.

54
Take X= 6 (thousand dollars)
So the normal density function is
f[-1.0166 + 0.04846(6)] = f(-0.72548)
Referring to normal distribution tables Z = -0.72548,
the normal density is about 0.3066
Now,
0.3066* 2 = 0.3066*0.04846 = 0.01485 = 1.4%

So starting with an income level of $6000, if the


income goes up to $1000, the probability of family
purchasing a house goes up by about 1.4%.

55
Practical Importance of Probit Regression

The probit analysis is commonly used


to analyze the data from medical field.

56
Drawback of Probit Regression:

The drawback is that, probit coefficients are more


difficult to interpret, hence they are less used, though
the choice is largely one of personal preference. Both
the cumulative standard normal curve used by probit
as a transform and the logistic curve used in logistic
regression display the S-shaped curve.

57
Basic assumption of Probit Regression:
As logistic regression is based on the assumption
that the categorical dependent reflects an underlying
qualitative variable and uses the binomial
distribution, probit regression assumes the
categorical dependent reflects an underlying
quantitative variable and it uses the cumulative
normal distribution.

58

You might also like