Stata Guide
Stata Guide
Stata Guide
destring sic, replace *Dropping financials drop if sic>5999 & sic<7000 *Dropping utilities drop if sic>4899 & sic<5000 *Drop ADR and foreign drop if sic==8880 | sic==8888 *Two-digit industry code tostring sic, gen(twodig) replace twodig="0"+twodig if length(twodig)==3 replace twodig=substr(twodig,1,2) destring twodig, replace If you have a missing value in your dataset, you cannot automatically set it to 0. However, if it is reasonable to assume that a missing value should be treated as 0, then you can replace it. Research and development expenses can be considered zero if missing. *Setting missing R&D to 0 as in the paper replace xrd=0 if xrd==. *Calculate variables of interest gen net_asset=at-ch gen cash_asset= ch/ net_asset gen cash_asset_tr=cash_ass replace cash_asset_tr=1 if cash_asset>1 replace cash_asset_tr=. if cash_asset==. gen size=ln(at) gen market_value=prcc_f*csho gen mb=(at-seq+market_value)/at gen rd_sales= xrd/revt gen cf_assets=(ebitda-dvc-intpn-txt)/net_asset gen nwc_assets=(wcap-ch)/net_asset gen capex_assets=capx/net_asset gen acq_assets= aqc/net_asset gen payout=(dvc+ prstkc)/net_asset gen leverage=(dltt+ dlc)/net_asset In most cases we assume that our variables are normally distributed. Since outliers can distort results, it is generally advisable to winsorize every variable at a reasonable level. Notice that I use a loop to iterate over all variables. Please review the -foreach- command to understand looping in Stata. *Winsorize at 5%
foreach v of varlist cash_asset-leverage { winsor v, gen(w_v) p(0.05) drop v rename w_v v } While not necessary, it is generally a good idea to label your variables. It will make it easier to output your results. *Labeling variables label variable net_asset "Assets minus cash" label variable market_value "Market value of equity (shares*prices)" label variable cash_asset "Cash/assets" label variable cash_asset_tr "Truncated cash/assets" label variable size "Size" label variable mb "Market-to-book value" label variable rd_sales "R&D/sales" label variable cf_assets "Cash-flow/assets" label variable nwc_assets "Net working capital/assets" label variable capex_assets "CAPEX/assets" label variable acq_assets "Acquisitions/assets" label variable payout "Dividends and repurchases/assets" label variable leverage "Leverage" order market_value, after(net_asset) sort gvkey fyear save "${filepath}company_data_sp_1990_2010.dta", replace
We replicate Table 1 of OPSW. We add quartiles and minimum and maximum values as well. There are many ways to generate a presentable output in Stata, as Stata 13 even supports generating Excel or Word les. Alternatively, you can use the -estout- or -tabout- commands, but the -le write- command gives total control over generating the output. The - n- tells Stata to start a new line, while - tab- is for going to the next column. Notice that I generate a temporary variable for each statistic using -local-. For convenience, I also round each statistic to 3 digits. To see the available statistics, check the -summarizecommand and the -detail- option. *Summary statistics - replicating table 1 use "${filepath}company_data_sp_1990_2010.dta", clear file open desc using "${outreg}summary.txt", write replace file write desc "Variable" _tab "Obs" _tab "Mean" _tab "St. Dev." _tab "25%" /* */_tab "Median" _tab "75%" _tab "Min" _tab "Max" foreach v of varlist cash_asset-leverage { summarize v, detail local label: variable label v
local mean=round(r(mean),.001) local sd=round(r(sd),.001) local q1=round(r(p25),.001) local median=round(r(p50),.001) local q3=round(r(p75),.001) local q0=round(r(min),.001) local q4=round(r(max),.001) file write desc _n "label" _tab (r(N)) _tab (mean) _tab (sd) _tab (q1) /* */ _tab (median) _tab (q3) _tab (q0) _tab (q4) } file close desc type "${outreg}summary.txt" If you run the code then you should get the same output as Table 1. In case you want to show the evolution of a certain statistic over time, use the -collapse- command. If you do everything correctly you get Figure 1. *Figure - replicating OPSW figure 2 use "${filepath}company_data_sp_1990_2010.dta", clear sort fyear preserve collapse (median) cash_asset, by(fyear) twoway area cash_asset fyear graph export "${outreg}question2.eps", as(eps) preview(on) replace restore If you want to run time-series or panel regressions, you have to set your data with either -tset- (for time series) or -xtset- (for panel). After that you can use lag -l- and dierence -d- operators. In case you are interested in the distribution of regression coecients, you have to use the -statsby- command. Figure 2 shows the distribution of coecients from an autocorrelation regression. *Figure - replicating OPSW figure 3 use "${filepath}company_data_sp_1990_2010.dta", clear xtset gvkey fyear capture by gvkey: egen counter=count(fyear) preserve statsby, clear by(gvkey): reg d.cash_asset d.l1.cash_asset if counter>4 drop if _stat_1==. histogram _stat_1 if _stat_1<2 & _stat_1>-1 graph export "${outreg}question3.eps", as(eps) preview(on) replace restore When presenting your results, you rst have to show univariate tests based on your main variable(s). To that end you have to sort your variables into 2-5 groups based on the distribution of the main variable and then you can compare statistics between those groups. One such univariate test is in Table 2. *Quartile table - replicating OPSW table 3 use "${filepath}company_data_sp_1990_2010.dta", clear 4
bysort fyear: egen cash_q=xtile(cash_asset), n(4) drop if cash_q==. gen testvar=. replace testvar=0 if cash_q==1 replace testvar=1 if cash_q==4 *Cash/asset quartile intervals forvalues x=1/4 { sum cash_asset if cash_q==x, detail local fromx=round(r(min),.01) local tox=round(r(max),.01) local obsx=r(N) } file open quartile using "${outreg}quartiles.txt", write replace file write quartile "Variable" _tab "Q1" _tab "Q2" _tab "Q3" _tab "Q4" _tab "t-statistic" _tab "from1 to to1" _tab "from2 to to2" /* file write quartile _n "Cash/assets range"
*/_tab "from3 to to3" _tab "from4 to to4" _tab foreach v of varlist cash_asset-leverage { local label: variable label v forvalues y=1/4 { summarize v if cash_q==y, detail local meany=round(r(mean),.001) local mediany=round(r(p50),.001) } ttest v, by(testvar) local tstat=round(r(t),.001) local pval=round(r(p),.001) file write quartile _n "label" _tab "mean1" _tab "mean2" /* */ _tab "mean3" _tab "mean4" _tab "tstat" file write quartile _n _tab "[median1]" _tab "[median2]" /* */_tab "[median3]" _tab "[median4]" _tab "(pval)" } file write quartile _n "Observations" _tab "obs1" _tab "obs2" _tab /* */ "obs3" _tab "obs4" _tab file close quartile type "${outreg}quartiles.txt" Finally, to quantify the eect of your explanatory variables on the dependent variable, you have to run regressions. As a rst step, it is enough to run a simple OLS regression using the -reg- command. It is generally a good idea to compute standard errors corrected for heteroscedasticity and to cluster standard errors at the rm level -vce-. Then to nd out if your results are robust, you can include year and/or
industry xed eects using the -i.- operator. If you want to interact xed eects you can use -*- or -#(check the dierence). You can always copy-paste your regression output to Excel yourself, but it is advisable to use some output command such as -outreg- to automate this process. A regression table should be similar to Table 3. *Regressions - question 5 use "${filepath}company_data_sp_1990_2010.dta", clear gen lncash_asset=ln(cash_asset) label variable lncash_asset "Log of cash/asset" gen dividend=0 replace dividend=1 if dvc>0 *Either generate regulation dummy or address why not doing so gen regulated=0 replace regulated=1 if sic== 4011 | sic== 4210 | sic== 4213 | /* */ sic== 4512 | sic== 4812 | sic== 4813 label variable dividend "Dividend payer dummy" label variable regulated "Regulated industry dummy" set matsize 5000 *OLS with clustered robust errors reg lncash_asset size */, vce(cluster gvkey) outreg using "${outreg}ols_simple.tex", tex */varlabels fragment replace *OLS with clustered robust errors, industry and year dummies reg lncash_asset size mb rd_sales cf_assets nwc_assets capex_assets leverage dividend regulated /* */ i.fyear i.twodig, vce(cluster gvkey) outreg using "${outreg}ols_simple.tex", tex summstat(r2_a\N) summtitle("Adjusted R2"\"N") /* */keep(size rd_sales cf_assets nwc_assets capex_assets leverage dividend regulated mb) /* */ addrows("Year dummy", "yes" \ "Industry dummy", "yes" \"Year-industry dummy", "no") /* */ varlabels fragment merge *OLS with clustered robust errors and industry-year dummies reg lncash_asset size mb rd_sales cf_assets nwc_assets capex_assets leverage dividend regulated /* */ i.fyear#twodig, vce(cluster gvkey) outreg using "${outreg}ols_simple.tex", tex summstat(r2_a\N) summtitle("Adjusted R2"\"N") /* */keep(size rd_sales cf_assets nwc_assets capex_assets leverage dividend regulated mb) /* */ addrows("Year dummy", "no" \ "Industry dummy", "no" \"Year-industry dummy", "yes") /* 6 summstat(r2_a\N) summtitle("Adjusted R2"\"N") /* */ addrows("Year dummy", "no" \ "Industry dummy", "no" \"Year-industry dummy", "no") /* mb rd_sales cf_assets nwc_assets capex_assets leverage dividend regulated /*
1995
2005
2010
Variable Cash/assets Truncated cash/assets Size Market-to-book value R&D/sales Cash-ow/assets Net working capital/assets CAPEX/assets Acquisitions/assets Dividends and repurchases/assets Leverage
Obs 11473 11473 11706 11695 11708 10666 11116 11365 10503 10765 11420
Mean 0.096 0.096 8.162 2.113 0.035 0.099 0.104 0.065 0.022 0.045 0.245
St. Dev. 0.105 0.105 1.346 1.178 0.056 0.056 0.156 0.044 0.042 0.054 0.162
0.25 0.018 0.018 7.239 1.282 0 0.061 -0.01 0.031 0 0.004 0.122
Median 0.055 0.055 8.141 1.695 0.005 0.094 0.074 0.054 0.001 0.025 0.239
0.75 0.134 0.134 9.168 2.504 0.046 0.132 0.2 0.088 0.022 0.064 0.354
Max 0.386 0.386 10.533 5.422 0.19 0.224 0.447 0.173 0.153 0.193 0.571
.5
0 .5 _b[LD.cash_asset]
1.5
Table 2: Replication of OPSW Table 3 Variable Cash/assets range Cash/assets Truncated cash/assets Size Market-to-book value R&D/sales Cash-ow/assets Net working capital/assets CAPEX/assets Acquisitions/assets Dividends and repurchases/assets Leverage Observations Q1 0 to .05 0.011 [.009] 0.011 [.009] 8.531 [8.551] 1.695 [1.46] 0.01 [0] 0.087 [.084] 0.06 [.028] 0.068 [.057] 0.025 [.002] 0.036 [.021] 0.314 [.307] 2876 Q2 .01 to .11 0.037 [.031] 0.037 [.031] 8.434 [8.461] 1.84 [1.574] 0.022 [.002] 0.091 [.09] 0.088 [.061] 0.062 [.051] 0.025 [.002] 0.043 [.026] 0.271 [.26] 2865 Q3 .03 to .2 0.088 [.083] 0.088 [.083] 8.137 [8.061] 2.144 [1.767] 0.04 [.008] 0.099 [.097] 0.116 [.097] 0.063 [.052] 0.021 [.001] 0.048 [.028] 0.219 [.195] 2870 Q4 .09 to .39 0.247 [.227] 0.247 [.227] 7.477 [7.463] 2.824 [2.394] 0.07 [.053] 0.123 [.125] 0.151 [.137] 0.066 [.055] 0.018 [0] 0.054 [.023] 0.175 [.127] 2862 t-statistic -131.917 (0) -131.917 (0) 31.659 (0) -36.52 (0) -43.61 (0) -22.614 (0) -21.187 (0) 1.526 (.127) 6.165 (0) -12.361 (0) 33.839 (0)
10
Table 3: Replication of OPSW Table 4 Log of cash/asset Size Market-to-book value R&D/sales Cash-ow/assets Net working capital/assets CAPEX/assets Leverage Dividend payer dummy Regulated industry dummy Constant Adjusted R2 N Year dummy Industry dummy Year-industry dummy -0.082
(3.59)**
0.121
(5.21)**
0.161
(6.79)**
0.167
(6.22)**
6.276
(13.01)**
4.817
(7.76)**
4.694
(7.07)**
2.945
(6.47)**
1.454
(3.70)**
1.417
(3.21)**
-0.620
(2.98)**
-0.388
(1.92)
-0.219
(0.99)
-3.969
(6.64)**
-0.445
(0.72)
-0.234
(0.33)
-1.568
(9.00)**
-1.046
(6.46)**
-1.017
(5.80)**
-0.351
(5.47)**
-0.211
(3.58)**
-0.213
(3.29)**
-0.282
(1.63)
-0.064
(0.27)
-0.060
(0.23)
-2.198
(10.07)**
0.28 10,239 no no no
* p < 0.05; ** p < 0.01
11