Lecture 9(2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Lecture 9

Dummy variables
and
Qualitative choice models
Dummy variables

• Definition: a way of turning qualitative variables into quantitative variables.


A dummy variable is a variable that can only take on two values: 0 or 1.

• For example: House prices can be influenced by other factors not directly
quantitative, such as: the presence of a driveway, basement, or air
conditioning.
• D1= 1 if the house has a driveway; 0 otherwise
• D2= 1 if the house has a basement; 0 otherwise
• D3= if the house has air conditioning; otherwise

• You can also include interactions between dummy variables. For example
you maybe interested in studying if house prices can be influenced by the
presence of a driveway and a basement: D4=D1*D2
House prices dataset: dummy variables

Driveway Recreation room Basement Gas Air cond


Mean 0.85Mean 0.17Mean 0.34Mean 0.04Mean 0.31
Standard Error 0.014Standard Error 0.016Standard Error 0.02Standard Error 0.008Standard Error 0.019
Minimum 0Minimum 0Minimum 0Minimum 0Minimum 0
Maximum 1Maximum 1Maximum 1Maximum 1Maximum 1
Dummy variables: Simple regression analysis
House prices dataset: simple regression
with dummy variables
• Y = House prices
• D = air-conditioning dummy variable

Coefficients Standard Error t-Stat P-value Lower 95% Upper 95%


Intercept 59,884.85 1,233.5 48.5 0.000 57,461.84 62,307.86
D 25,995.74 2,191.3 11.8 0.000 21,691.18 30,300.29

• We can say that houses with air-con tend to be worth £25,996 more than houses
without air-con.
• Houses without air-con are worth in average £59,885, and houses with air-con are
worth in average £85,881 (59,884+25,995).
House prices dataset: multiple regression
with dummy variables
• Dummy variables: Driveway and air-con

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 999.3 4,049.8 0.2 0.805 -6,956.0 8,954.7
bedrooms 4,808.8 1,265.3 3.8 0.000 2,323.4 7,294.3
baths 19,060.7 1,795.3 10.6 0.000 15,534.0 22,587.4
stories 4,082.4 1,087.7 3.8 0.000 1,945.6 6,219.1
driveway 18,045.5 2,371.5 7.6 0.000 13,387.1 22,704.0
air cond 17,283.3 1,848.0 9.4 0.000 13,653.1 20,913.4

• We can say that having air-con adds an extra value to the house price of £17,283
(holding constant the rest of the variables).
• Dummy trap: drop one of the dummy variables as there are problems of
multicollinearity (example: male and female)
Appendix: Excel

• How to create dummy variables?


• Select a cell and type: =COUNTIF(range of cells, ">1")
• How to obtain descriptive statistics
• Go to data analysis
• Select the option descriptive statistics: a new window will pop in
• Introduce in input range (variables)
• Select the option summary statistics and labels in first row. Click OK
• Output will appear in a new working sheet.
The statistics of choice

• Dummy variable as dependent variable (Y). Example: satisfaction


questionnaires. An individual has to make a choice between two
alternatives, and we give the value of 0 to one of the alternatives and 1 to
the other alternative.
• When the dependent variable is a dummy variable we use a logit or a
probit model.

1 0
• The logit/probit coefficients do not directly measure marginal effects and
so it is hard to interpret them. We can interpret the signs on the
coefficient.
The logit and probit model: example
• Cross-sectional data for 601 individuals.

Variable Obs Mean Std. Dev. Min Max


Sex 601 .47 .49 0 1
Age 601 32.48 9.28 17.5 57
Number of years married 601 8.17 5.57 0.125 15
Children 601 .71 .45 0 1
Religiousness 601 3.11 1.16 1 5
Education 601 16.16 2.4 9 20
Self rating marriage 601 3.93 1.1 1 5
Occupation 601 4.19 1.81 1 7
Number of affairs 601 1.45 3.29 0 12
The logit and probit model: example
• Probit model. Dependent variable Affair.
• Affair=1 if number of affairs in the last year>0, 0 otherwise

Variable Obs Mean Std. Dev. Min Max


Sex 601 .25 .43 0 1

VARIABLES Coefficients s.e.


Sex 0.189 (0.130)

Age -0.0244** (0.0104)

Number of years married 0.0546*** (0.0188)

Children 0.208 (0.163)

Religiousness -0.186*** (0.0516)

Education 0.0155 (0.0266)

Self rating of marriage -0.273*** (0.0534)

Constant -0.764 (0.510)

Observations 601
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
The logit and probit model: example
• The fact that the coefficients of “religiousness” or “self rating of marriage” are negative
means that individuals who are either religious or happily married are less likely to have
an extra marital affair.
• The coefficient of “number of years married” is statistical significant and positive, this
means that individuals who have been married longer are more likely to have affairs.
• However we can not interpret the magnitude of the coefficients. We need the marginal
effects.
VARIABLES Coefficients s.e.
Sex 0.189 (0.130)

Age -0.0244** (0.0104)


Number of years married 0.0546*** (0.0188)

Children 0.208 (0.163)

Religiousness -0.186*** (0.0516)

Education 0.0155 (0.0266)

Self rating of marriage -0.273*** (0.0534)

Constant -0.764 (0.510)

Observations 601
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
The logit and probit model: example
• Marginal effect of the “number of years married”: if the length of a marriage increases by
one, then the probability of having an affair goes up by 0.016, holding other explanatory
variables constant. Probabilities can be interpreted as percentages so we say that: every
extra year of marriage increases the probability of having an affair by about 1.6%, holding
other explanatory variables constant.
• Being religious tends to lower the probability of having an affair by 5.6%, ceteris paribus.

VARIABLES Marginal effects [Pr(affair)=1)] s.e.


Sex 0.057 (0.03)

Age -0.073** (0.003)


Number of years married 0.016*** (0.005)

Children 0.061 (0.04)

Religiousness -0.056*** (0.01)

Education 0.004 (0.008)

Self rating of marriage -0.082*** (0.01)

Standard errors in parentheses


*** p<0.01, ** p<0.05, * p<0.1

You might also like