14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
If youre not sure its in there, ask someone. And then consult
reference manuals. And (maybe) e-mail around. Dont re-invent
the wheel! If its not too hard to do, its likely that someone has
already done it.
Examples:
spunk
= invnorm(uniform())
z
= invnorm(uniform())
schooling = invnorm(uniform()) + z + spunk + fe
ability
= invnorm(uniform()) + spunk
e
= invnorm(uniform())
y = schooling + ability + e + 5*fe
reg y schooling
xtreg y schooling , i(id) fe
xi: reg y schooling i.id
xi i.id
reg y schooling _I*
areg y schooling, absorb(id)
ivreg y (schooling = z) _I*
xtivreg y (schooling = z), i(id)
xtivreg y (schooling = z), i(id) fe
Data check
Results
Results, cont
Results, cont
Results,
cont
Results,
cont
glm digression
Manning (1998)
In many analyses of expenditures on health care, the expenditures
for users are subject to a log transform to reduce, if not eliminate,
the skewness inherent in health expenditure data In such cases,
estimates based on logged models are often much more precise
and robust than direct analysis of the unlogged original dependent
variable. Although such estimates may be more precise and robust,
no one is interested in log model results on the log scale per se.
Congress does not appropriate log dollars. First Bank will not
cash a check for log dollars. Instead, the log scale results must be
retransformed to the original scale so that one can comment on the
average or total response to a covariate x. There is a very real
danger that the log scale results may provide a very misleading,
incomplete, and biased estimate of the impact of covariates on the
untransformed scale, which is usually the scale of ultimate interest.
glm
clear
set obs 100
gen
gen
gen
gen
x = invnormal(uniform())
e = invnormal(uniform())
y = exp(x) + e
log_y = log(y)
reg y x
reg log_y x, robust
glm y x, link(log) family(gaussian)
glm, cont
- Regression in levels
produces coefficient
that is too large,
while regression in
logs produces
coefficient that is too
low (which we expect
since distribution of y
is skewed)
Non-parametric estimation
Stata has built-in support for kernel densities. Often a useful
descriptive tool to display smoothed distributions of data
Can also non-parametrically estimate probability density functions of
interest.
Example: Guerre, Perrigne & Vuong (EMA, 2000) estimation of firstprice auctions with risk-neutral bidders and iid private values:
Estimate distribution of bids non-parametrically
Use observed bids and this estimated distribution to construct
distribution of values
Assume values are distributed according to following CDF:
F (v ) 1 e v
Then you can derive the following bidding function for N=3 bidders
QUESTIONS:
- What does it mean if the coefficient on edclg differs by quantile?
- What are we learning when the coefficients are different? (HINT: What does it
tell us if the coefficient is nearly the same in every regression)
- What can you do if education is endogenous?
n 1/ n
Y A (1 d ) K (d ) L
n
global d = 0.6
global n = 4.0
global A = 2.0
gen k = exp(invnormal(uniform()))
gen l = exp(invnormal(uniform()))
gen e = 0.1 * invnormal(uniform())
** CES production function
gen y = ///
$A*( (1-$d)*k^($n) + $d*l^($n) )^(1/$n) + e
nl (y = {b0}*( (1-{b1})*k^({b2}) + ///
{b1}*l^({b2}) )^(1/{b2}) ), ///
init(b0 1 b1 0.5 b2 1.5) robust
More NLLS
probit
tobit
logit
clogit
ivprobit
ivtobit
heckman
cnsreg
mlogit
mprobit
ologit
oprobit
clear
set more off
set mem 100m
set matsize 1000
local B = 1000
matrix Bvals = J(`B', 1, 0)
matrix pvals = J(`B', 2, 0)
forvalues b = 1/`B' {
drop _all
quietly set obs 200
gen cons = 1
gen x = invnormal(uniform())
gen e = x*x*invnormal(uniform())
gen y = 0*x + e
qui regress y x cons, nocons
matrix betas = e(b)
matrix Bvals[`b',1] = betas[1,1]
qui testparm x
matrix pvals[`b',1] = r(p)
qui regress y x cons , robust nocons
qui testparm x
matrix pvals[`b',2] = r(p)
}
drop _all
svmat Bvals
svmat pvals
summ *, det
Monte Carlo
in Stata,
cont
drop _all
Suppresses output
qui testparm x
OLS by hand
clear
set obs 10
set seed 14170
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen y = 1 + x1 + x2 + 0.1 * invnorm(uniform())
gen cons = 1
mkmat x1 x2 cons, matrix(X)
mkmat y, matrix(y)
matrix list X
matrix list y
(X ' X ) X ' y
1
OLS by hand
clear
set obs 100000
set seed 14170
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen y = 1 + x1 + x2 + 0.1 * invnorm(uniform())
gen cons = 1
mkmat x1 x2 cons, matrix(X)
mkmat y, matrix(y)
matrix list X
matrix list y
matrix beta_ols = invsym(X'*X) * (X'*y)
matrix e_hat = y - X * beta_ols
matrix V = (e_hat' * e_hat) * invsym(X'*X) /
(rowsof(X) - colsof(X))
matrix beta_se = (vecdiag(V))'
local rows = rowsof(V)
forvalues i = 1/`rows' {
matrix beta_se[`i',1] = sqrt(beta_se[`i',1])
}
matrix ols_results = [beta_ols, beta_se]
matrix list ols_results
reg y x1 x2
clear
set obs 100000
set seed 14170
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen y = 1 + x1 + x2 + 100 * invnorm(uniform())
OLS by hand
v2.0
clear
set obs 1000
program drop _all
program add_stat, eclass
ereturn scalar `1' = `2'
end
helper programs
gen z = invnorm(uniform())
gen v = invnorm(uniform())
gen x = .1*invnorm(uniform()) + 2.0*z + 10.0*v
gen y = 3.0*x + (10.0*v + .1*invnorm(uniform()))
reg y x
estimates store ols
reg x z
test z
return list
add_stat "F_stat" r(F)
estimates store fs
reg y z
estimates store rf
ivreg y (x = z)
estimates store iv
estout * using baseline.txt, drop(_cons) ///
stats(F_stat r2 N, fmt(%9.3f %9.3f %9.0f)) modelwidth(15) ///
cells(b( fmt(%9.3f)) se(par fmt(%9.3f)) p(par([ ]) fmt(%9.3f)) ) ///
style(tab) replace notype mlabels(, numbers )
helper programs
helper programs
0.236
0.048
Vrobust
x
)
(
x
)
(
X
X
)
i i
i i
i 1
( X X ) 1
0.049
Vrobust
( X X ) 1
N K
1
(
x
)
(
x
)
i i
i i (X X )
i 1
STATA v9.1
STATA v10.0
Exercises
Go to following URL:
http://web.mit.edu/econ-gea/14.170/exercises/
Download each DO file
No DTA files! All data files loaded from the web (see help
webuse)
1 exercise (increasing difficulty)
A. Monte carlo test of OLS/GLS with serially correlated data
B. Heckman two-step with bootstrapped standard errors
C. Correcting for measurement error of known form