Quick Stata Tips
Quick Stata Tips
Quick Stata Tips
Version 1.0
Todd R. Jones1
1 Mississippi State University, IZA, and CESifo. toddrjones.com. Twitter page. Bluesky page.
To Ari, Charlotte, Annie, and Ned
2
Preface
This book compiles a number of my “Quick Stata Tips.” I am basing this book on Stata
version 18 (and sometimes 17) on a Mac. However, much—though not all—of the content
in this book applies to (recent) prior versions of Stata. Many—though not all—of the tips
build on concepts introduced in earlier tips.
I have picked up these tips over the years from many different sources. For example,
StackOverflow, StataList, Google, my own coding, and other people. For a book like this,
it’s hard to know how to acknowledge everyone, so I will keep it very general and say
that I would like to thank various people and websites from whom I learned about some
of these tips, as well as others who have developed the packages that I refer to.
I have created a companion Stata .do file with code for most of the tips. You can down-
load it here.
3
Contents
1 fre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 mdesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 inlist and inrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Multicursor Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Scroll Buffer Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7 ereplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8 r(table) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9 ssc hot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
10 opacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
11 Previous line of code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12 Sort with, but don’t group by . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
13 Open multiple instances of Stata . . . . . . . . . . . . . . . . . . . . . . . . . 19
14 isid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
15 texdoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
16 Check if variable is constant within group . . . . . . . . . . . . . . . . . . . . 22
17 trim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
18 group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
19 tempfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
20 Execute (include) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
21 Control placement of newly-created variables . . . . . . . . . . . . . . . . . . 27
22 substr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
23 browse if . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
24 sysuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
25 bysort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
26 n and N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
27 preserve/restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
28 capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
29 tab1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
30 statastates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
31 compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
32 quietly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
33 set graphics off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
34 Undocumented and previously documented commands . . . . . . . . . . . 40
35 Pull variables from the ACS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4
36 Graph at county level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
37 Animated maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
38 Animated graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
39 mscatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
40 duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
41 Notify when code is finished running . . . . . . . . . . . . . . . . . . . . . . 47
42 Display loop progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
43 Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
44 Loop over all variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
45 Calculate total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
46 xtile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
47 Remove elements from a local . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
48 Add elements to a macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
49 Save value label to macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
50 Access stored results and other parameters . . . . . . . . . . . . . . . . . . . 56
51 Go between numeric and string with existing variable . . . . . . . . . . . . . 57
52 Go between numeric and string when creating variable . . . . . . . . . . . . 58
53 Version control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
54 asgen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
55 collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
56 seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
57 Add leading zero to number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
58 Main effects and interactions in regressions . . . . . . . . . . . . . . . . . . . 64
59 keepusing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
60 geodist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
61 geonear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
62 georoute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
63 heatplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
64 Color by a third variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
65 binscatterhist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
66 alluvial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
67 sankey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
68 Scrape webpages with readhtml . . . . . . . . . . . . . . . . . . . . . . . . . 74
69 Choose which variables and observations to load . . . . . . . . . . . . . . . . 75
70 Use datasets from internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
71 Choose which category to omit in a regression . . . . . . . . . . . . . . . . . 77
72 fillin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
73 levelsof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
74 Increment local . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
75 labelbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
76 twoway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
77 rowtotal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
78 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
79 set seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
80 Length of string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5
81 clonevar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
82 Label values based on value labels of another variable . . . . . . . . . . . . . 88
83 Locate .do files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
84 Include commas in large numbers . . . . . . . . . . . . . . . . . . . . . . . . 90
85 Highlight selected bars in bar chart . . . . . . . . . . . . . . . . . . . . . . . . 91
86 keeporder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
87 Create your own function/program . . . . . . . . . . . . . . . . . . . . . . . 93
88 gsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
89 Sort descending when using bysort . . . . . . . . . . . . . . . . . . . . . . . . 95
90 moreobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
91 coefplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
92 expand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
93 nvals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
94 regsave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
95 Access certain rows of a variable . . . . . . . . . . . . . . . . . . . . . . . . . 101
96 Refer to observations by row number . . . . . . . . . . . . . . . . . . . . . . 102
97 colorpalette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6
1 fre
If you want to look at a one-way frequency table of a variable, fre displays both values
and labels at the same time, while tabulate does not. To use fre:
ssc install fre
sysuse auto2, clear
tab foreign
fre foreign
7
2 mdesc
To quickly see how many observations of each variable are missing, use mdesc:
ssc install mdesc
sysuse lifeexp, clear
mdesc
8
3 inlist and inrange
And:
keep if inrange(distance, 10, 91)
9
4 Multicursor Mode
Stata supports multi cursor mode. With a Mac, hold down Option and drag the cursor.
In Windows, hold down Alt.
You can also place multiple cursors. On a Mac, hold down Command and click. On
Windows, hold down Control and click.
10
5 compare
To quickly compare two variables to see how often the values of the first are higher, lower,
and equal to the second, use compare:
sysuse auto2, clear
compare mpg trunk
11
6 Scroll Buffer Size
To have the ability to scroll up as far as possible in your Results window, run this code:
set scrollbufsize 2048000
12
7 ereplace
It is not possible to replace when using egen commands. You can instead use ereplace.
13
8 r(table)
After a regression, you can use r(table) to directly get the 95% confidence interval, t-
statistic, p-value, standard error, beta, etc.
sysuse auto2, clear
reg trunk weight
matrix list r(table)
local weight_lower_95ci = r(table)[5,1]
di "‘weight_lower_95ci’"
14
9 ssc hot
Use ssc hot to see the most popular user-contributed SSC Stata packages:
ssc hot, n(100)
15
10 opacity
To add opacity to your graph, use %X after the color, where 0≤X≤100:
sysuse sp500, clear
replace high = high+80
twoway (hist high, width(20) color(blue%50)) \\\
(hist low, width(20) color(red%50)), \\\
scheme(s1mono) legend(order(1 "Blue" 2 "Red"))
16
11 Previous line of code
To retrieve the previous line(s) of code in the Command Prompt:
fn key + up arrow (Mac)
Page Up (PC)
Control + R (Mac or PC)
17
12 Sort with, but don’t group by
When using bysort, say you have a variable you want to sort with, but *not* group by. Put
this variable in parentheses:
sysuse census, clear
*keep the least-populous state within each region, so sort by pop
* within each region:
bys region (pop): keep if _n==1
18
13 Open multiple instances of Stata
On a Mac, to open up multiple instances of Stata, type the following into Terminal:
open -n /Applications/Stata/StataMP.app
19
14 isid
To check if you data are unique at a certain level, use isid. An error means that it is not
unique, while no error means that it is unique.
sysuse auto2, clear
isid foreign
isid foreign make
20
15 texdoc
estout and outreg2 are good to create .tex files, but they can’t always do what I want. I’ve
switched to the fully customizable texdoc. This version loops over both the right hand
side variables and the different regression specifications.
21
16 Check if variable is constant within group
To check if a variable is constant within group, you can do:
bys group (var): gen a = var[1]==var[_N]
tab a
If all values of a are 1, the variable is constant within group. If some values of a group are
missing, that group is considered not constant.
22
17 trim
To get rid of the leading and trailing space of a string variable, use trim:
clear all
input str12 str
"String A "
" String B "
" String C"
end
23
18 group
To create a variable identifying groups, use group:
sysuse xtline1, clear
egen grp = group(day)
*check that it worked
sort day
24
19 tempfiles
You can use tempfiles to save and then repeatedly load data. They are deleted when you
quit Stata.
sysuse auto2, clear
tempfile a
save ‘a’
Later:
use ‘a’, clear
Or:
merge m:1 id using ‘a’
25
20 Execute (include)
Usually, you cannot access in the Command Prompt the locals and tempfiles you created
in the Do-File Editor, and vice versa. But you can if you run your code with Execute
(include) (as opposed to the default Execute (do)). My coding preference is to be constantly
shifting between the Command Prompt and Do-file Editor, and this tip allows me to do
so.
System Preferences - Keyboard - Shortcuts - App Shortcuts - Stata - eg. Execute (include).
If you do prefer to run your code the traditional way, you can use keyboard shortcuts to
run/Execute (do) your code. On Windows: Control+Shift+D. On Mac: Command+Shift+D.
26
21 Control placement of newly-created variables
When you generate a new variable, use before to place it before another variable, and after
to place it after a variable.
sysuse auto2, clear
*create version of price in units of thousands
gen price_k = price/1000, before(price)
27
22 substr
substr is one way to extract characters from a string. The format is substr(var, X, Y), where
X indexes the first position, and Y is the number of characters you want to extract.
sysuse auto2, clear
*get the first two letters of the string
gen first = substr(make, 1, 2), after(make)
28
23 browse if
To browse only a subset of your data, use browse if :
sysuse auto2, clear
*browse only the cars that begin with "A"
br if substr(make,1,1)=="A"
29
24 sysuse
sysuse loads toy datasets that are already in Stata. sysuse dir lists all such datasets. These
can be useful when, for example, you want to use toy data to ask a question on Stack-
Overflow or StataList.
sysuse dir
30
25 bysort
Use bysort.
31
26 n and N
Use n to get the current observation number. Use N to get the maximum observation
number.
sysuse auto2, clear
keep rep78
sort rep78
*obs #:
gen n = _n
*obs # w/i group:
bys rep78: gen group_n = _n
*max obs #:
gen N = _N
*max obs # w/i group:
bys rep78: gen group_N = _N
32
27 preserve/restore
You can save a copy of your data with preserve. You can then change the data and restore
the saved version with restore. (You can also accomplish this goal using tempfiles.)
sysuse auto2, clear
*preserve the data to be used later
preserve
*change the data
keep if _n<5
scatter price mpg
*restore the data
restore
You can cancel the preserve with restore, not. Then you can preserve the data again.
*preserve again
preserve
keep if _n<10
*cancel the prior preserve
restore, not
*preserve again
preserve
33
28 capture
Use capture to allow Stata to keep running the code even if there is an error in the line. In
this example, we want to preserve the data but are not sure if the data has been preserved
before. If it has, then we will get an error. So we want to cancel the preserve. However, if
we cancel the preserve and it hasn’t been preserved, it will also give an error. So we will
use capture to cancel the preserve if it exists; if it doesn’t, the code will continue to run.
sysuse auto2, clear
capture restore, not
preserve
34
29 tab1
To make one-way frequency tables for multiple variables, use tab1:
sysuse auto2, clear
*tabulate (separately) make, price, and mpg:
tab1 make price mpg
*tabulate all variables:
tab1 *
35
30 statastates
Use statastates for a crosswalk between two-digit US state abbreviation, state name, and
state FIPS code:
capture ssc install statastates
sysuse census, clear
keep state2
statastates, abbreviation(state2) nogen
replace state_name = strproper(state_name)
36
31 compress
If you have a long string variable, and then shorten its values, use ”compress” to shorten
the variable. This makes the Data Editor easier to look at.
sysuse auto2, clear
replace make = "This is a long string...." in 1
replace make = substr(make, 1, 6)
compress
37
32 quietly
Place quietly in front of a command like reg to suppress the output.
sysuse auto2, clear
quietly reg trunk weight
38
33 set graphics off
Use set graphics off to not display graphs when they are created. This can be useful if
you are making a lot of graphs and want to save them, but don’t want the graphs to
incessantly pop up on your screen.
set graphics off
39
34 Undocumented and previously documented commands
To see undocumented commands:
help undocumented
To see commands that were previously documented, but are not documented anymore:
help prdocumented
40
35 Pull variables from the ACS
Use getcensus to pull variables from the American Community Survey (ACS). To be able
to submit more than 500 requests per day, you need have to get a Census Key at https:
//api.census.gov/data/key_signup.html; you should then activate it.
ssc install getcensus
*replace XYZ w/ your key
global censuskey XYZ
*get population by county
getcensus B01003, year(2015) sample(5) geography(county) clear
41
36 Graph at county level
Use maptile to create choropleth maps at the county level. Other geographies, such as
state, are also supported.
In this example, we’ll color counties according to population, which we pull from the
ACS:
ssc install getcensus
*replace XYZ w/ your key
global censuskey XYZ
*get population by county
getcensus B01003, year(2015) sample(5) geography(county) clear
42
37 Animated maps
Maybe you want to show your variation over time and space. While it’s much easier to
make an animated map with R’s gganimate, you can do it in Stata as in this toy example:
ssc install maptile
ssc install spmap
maptile_install using "http://files.michaelstepner.com/geo_state.zip"
sysuse census, clear
rename state q
rename state2 state
gen year = _n+1900
fillin state year
You then need to stitch them together. On a Mac, you can go to the Terminal, change the
directory to where the files are stored, then:
convert *.png a.gif
43
38 Animated graphs
Using the same general approach as the tip directly above, you can create animated
graphs:
sysuse uslifeexp, clear
forvalues i = 1900/1999 {
scatter le_male le_female if year==‘i’, title(‘i’) scheme(s1mono)
yscale(r(35 80)) xscale(r(40 80)) ylabel(40(10)80)
xlabel(40(10)80)
gr export ‘i’.png, replace
}
44
39 mscatter
Use mscatter to create scatter plots with color gradients.
capture ssc inst mscatter
capture ssc inst palettes
sysuse sp500, clear
mscatter change close if inrange(change, -30, 30), msymbol(O) msize(7)
sch(s1mono) over(change) colorpalette(viridis)
45
40 duplicates
To keep one observation per group:
sysuse auto2, clear
keep if _n<15
bys turn: keep if _n==1
You can also use duplicates. To check if observations are unique within group, you can use
duplicates report. To drop duplicates, you can use duplicates drop. The observations that are
kept will not necessarily be the same as the bysort approach.
sysuse auto2, clear
keep if _n<15
*not unique:
duplicates r turn
*unique:
duplicates r gear_ratio turn
*drop dups
duplicates drop turn, force
46
41 Notify when code is finished running
To have Stata play a sound when it’s done running your code, select the Play sound with
notification option.
And you use the beep to make Stata beep. This could be useful to run when a loop is
finished.
beep
You can also use statapush to have Stata message your phone when it’s done running.
ssc install statapush
help statapush
47
42 Display loop progress
When you’re doing a loop, you can show the progress using the following approach,
which displays a message every 100th iteration:
local iterations = 1000
forvalues i=1/‘iterations’ {
if mod(‘i’/100, 1)==0 di "Iteration ‘i’ of ‘iterations’"
}
48
43 Timers
To time how long your code takes, you can use etime:
capture ssc install etime
etime, start
forvalues i=1/1000000 {
quietly di "‘i’"
}
etime
Or, you can use rmsg. (Turn it off with set rmsg off.)
set rmsg on
*to do this permanently
*set rmsg on, permanently
forvalues i=1/1000000 {
quietly di "‘i’"
}
49
44 Loop over all variables
To loop over all variables, use varlist all:
sysuse census, clear
foreach var of varlist _all {
rename ‘var’ ‘var’_42
}
50
45 Calculate total
If you want to calculate the total of a variable, use egen total. Don’t use gen sum as this
calculates the running/cumulative total.
sysuse auto2, clear
gen one = 1
egen total = total(one)
51
46 xtile
Use xtile to create a new variable that groups the values of another variable into quantile
bins.
sysuse auto2, clear
keep turn
*quartiles
xtile turn_quartile = turn, nq(4)
*deciles
xtile turn_decile = turn, nq(10)
52
47 Remove elements from a local
Here’s how to create a local with all variables except one:
sysuse auto2, clear
ds
local vars ‘r(varlist)’
di "‘vars’"
local remove_vars "make"
local vars_new: list vars - remove_vars
di "‘vars_new’"
53
48 Add elements to a macro
Here’s how to add elements to a local:
local loc a b c
di "‘loc’"
*add d & e
local loc ‘loc’ d e
di "‘loc’"
54
49 Save value label to macro
To save a value label to a local, use this format: local loc: label (var) X, where X is the value.
sysuse auto2, clear
fre foreign
local foreign_0: label (foreign) 0
local foreign_1: label (foreign) 1
di "‘foreign_0’"
di "‘foreign_1’"
scatter price displacement if foreign==0, title(‘foreign_0’)
55
50 Access stored results and other parameters
You can use return list, all and ereturn list, all (and sreturn list, all) to show all stored results.
creturn list shows other parameters.
sysuse auto2, clear
reg weight gear_ratio
ereturn list, all
return list, all
matrix list r(table)
creturn list
di "‘c(pi)’"
56
51 Go between numeric and string with existing variable
destring converts from string to numeric, and tostring converts from numeric to string:
sysuse auto2, clear
tostring turn, replace
destring turn, replace
57
52 Go between numeric and string when creating variable
real() converts from string to numeric, and string() converts from numeric to string:
sysuse pop2000, clear
keep if _n>2
gen age = real(substr(agest, 1, 2)), after(agestr)
gen age_string = string(age), after(age)
58
53 Version control
A rudimentary version of version control: at the beginning of your file, create a global for
the file path of your output, which you can then change every day:
global output "/Users/me/Research/projectName/output/12-25-2023/"
...
graph export "${output}/a.png"
59
54 asgen
Use asgen to create a weighted average:
capture ssc install asgen
sysuse census, clear
bys region: asgen weighted_medage = medage, weight(pop)
60
55 collapse
Use collapse to perform operations within groups to create new variables. Here, we will
compute the mean of medage within region, as well as the number of observations within
region.
sysuse census, clear
gen N = 1
collapse (mean) medage (sum) N, by(region)
We can also weight. This gives the same numbers as the asgen example above.
sysuse census, clear
collapse (mean) medage [aweight=pop], by(region)
61
56 seq
Use seq to repeat sequences of numbers:
ssc install seq
sysuse auto2, clear
*repeat 1 2 3
seq rep1, from(1) to(3)
*repeat 1 1 2 2 3 3
seq rep2, from(1) to(3) block(2)
62
57 Add leading zero to number
here’s how to add a leading zero to a number, which is useful when working with state
FIPS codes. Here, we will add a zero to a one-digit numbers:
clear all
set obs 15
gen fips = _n
tostring fips, gen(fips2)
replace fips2 = "0" + fips2 if strlen(fips2)==1
63
58 Main effects and interactions in regressions
Here’s how to think about main effects and interactions in regressions.
By default, main effects are continuous. The i. prefix makes them categorical.
By default, variables in interaction terms are categorical. The c. prefix makes them con-
tinuous.
Assume the interaction term has two variables. To include only the interaction term, use
# between the variables.
To include both the main effects and the interaction term, use ## between the variables.
64
59 keepusing
When merging using data to the main data, use the ”keepusing” option to keep only
select variables from the using data:
webuse nhanes2, clear
tempfile nh
save ‘nh’
65
60 geodist
geodist calculates the “as the crow flies” distance between points:
ssc install geodist
clear
input double lat lon
34.043026 -118.26694
39.74915 -105.00740
end
geodist 42.366570 -71.06186 lat lon, gen(dist) miles
66
61 geonear
Use geonear to find the nearest point in B to each point in A:
ssc inst geonear
clear
set ob 20
set se 1
g n2=_n
g la2=39+5*rt(5)
g lo2=-99+9*rt(5)
tempfile a
save ‘a’
ren n2 n
g la=la2+3*rt(3)
g lo=lo2+4*rt(4)
drop *2
geonear n la lo using ‘a’, n(n2 la2 lo2)
67
62 georoute
Use georoute to calculate travel times between addresses or coordinates. You can choose
car, bike, walk, and public transit.
Note you have to register an account to use the here .com API, and the free version limits
the number of requests.
68
63 heatplot
use heatplot to make a heatmap:
capture ssc install heatplot
webuse nhanes2, clear
heatplot weight height
69
64 Color by a third variable
Use colorvar to color a scatterplot by a third variable:
sysuse auto2, clear
scatter weight length, colorvar(turn)
70
65 binscatterhist
Use binscatterhist to show a binned scatterplot along with histograms of the variables.
capture ssc install binscatterhist
sysuse auto2, clear
binscatterhist weight length, hist(weight length) ymin(1100)
xhistbarheight(30) yhistbarheight(13)
71
66 alluvial
Use alluvial to make alluvial plots in Stata:
ssc install alluvial
*example from package tutorial
sysuse nlsw88.dta, clear
alluvial race married collgrad smsa union
72
67 sankey
Use sankey to make Sankey plots:
ssc install sankey
*example from package tutorial
use
"https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey2.dta?raw=true"
clear
sankey value, from(source) to(destination) by(layer) noval showtot
palette(CET C6) laba(0) labpos(3) labg(-1) offset(10)
73
68 Scrape webpages with readhtml
Use readhtml to scrape (parts of certain) web pages. Below is the percent of a state that’s
covered by forest.
capture net install readhtml, from(https://ssc.wisc.edu/sscc/stata/)
capture ssc install statastates
capture ssc install maptile
capture ssc install spmap
capture maptile_install using
"http://files.michaelstepner.com/geo_state.zip"
gen st=substr(S,7,30)
drop if inlist(_n,3,4,16,18,24,40,57)
gen forest=substr(Percent_forest_2,1,length(Percent_forest_2)-2)
destring forest, replace
keep st forest
statastates, name(st)
rename state_abbrev state
maptile forest, geo(state) fcolor(Greens) twopt(title("Percent
Forest"))
74
69 Choose which variables and observations to load
When loading data with use, you can load only select vars with a varlist, use in to select
observations by observation number, and/or use if to select observations with logic:
sysuse auto2, clear
tempfile a
save ‘a’
use make turn using ‘a’ in 2/9
use ‘a’ if length>200
75
70 Use datasets from internet
You can load toy datasets from the internet. The links from help dta manuals give you a lot
to choose from.
help dta_manuals
help q_base
use https://www.stata-press.com/data/r18/apple.dta
76
71 Choose which category to omit in a regression
To choose which category to omit in a regression, use bX, where X is the value you wish
to omit:
sysuse auto2, clear
*view values of rep78
fre rep78
*default:
reg mpg i.rep78
*omit category 3 (average)
reg mpg ib3.rep78
77
72 fillin
Use fillin to fill in dataset so that all combinations of the variables are present:
clear all
input year str1 state value
1 "A" 6
2 "A" 3
4 "A" 5
3 "B" 1
end
fillin year state
78
73 levelsof
To get all values of a variable, use levelsof (and the local() option to put them into a local)
sysuse auto2, clear
levelsof mpg
levelsof trunk
*put it into a local
levelsof trunk, local(trunklevels)
di "‘trunklevels’"
For strings:
levelsof make
levelsof make, clean
79
74 Increment local
Use ++ to increment a local:
local b = 1
local b = ‘b’ + 1
di "‘b’"
*Equivalent:
local a = 1
local ++a
di "‘a’"
80
75 labelbook
To show all value labels, use labelbook:
sysuse auto2, clear
labelbook
81
76 twoway
To graph multiple things at once, use twoway:
sysuse auto2, clear
twoway scatter mpg length || ///
scatter mpg displacement, ///
scheme(s1color) legend(label(1 "Length") label(2 "Displacement"))
82
77 rowtotal
Use rowtotal to sum across columns. Note that rowtotal treats missings as 0s.
sysuse auto2, clear
gen sum1 = price + mpg + rep78
egen sum2 = rowtotal(price mpg rep78)
egen sum3 = rowtotal(price-rep78)
83
78 Random variables
To create a uniform random variable, use runiform().
sysuse auto, clear
gen r=runiform()
There are other functions, such as rexponential. To see them, type in help runiform and then
click on the View complete PDF manual entry link.
84
79 set seed
To ensure reproducibility, use set seed. This makes it so that the generate random numbers
will be the same each time you run it.
sysuse auto, clear
set seed 42
gen r=runiform()
85
80 Length of string
To count the number of words in a string, use word count. To count the number of charac-
ters in a string, use strlen.
local loc "How long is this?"
local loc_words : word count ‘loc’
di "‘loc_words’"
86
81 clonevar
To make an exact copy of another variable, use ”clonevar”:
sysuse auto2, clear
*this does not create an exact copy:
gen foreign2 = foreign
*this does:
clonevar foreign3 = foreign
87
82 Label values based on value labels of another variable
To label the values of var 1 using the value labels of var 2, use ”describe” on var 2 to find
the name of the value label, then apply to var 1.
sysuse auto2, clear
set seed 1
gen new = round(runiform()*4+1)
describe rep78
label values new repair
ssc install fre
fre new
88
83 Locate .do files
Find the .do files in a given directory on your computer that contain a particular word or
phrase:
ssc install find
ssc install rcd
rcd "/Users/Todd/Google Drive": find *.do, match(sysuse auto2) show
89
84 Include commas in large numbers
To make it so that large numbers are displayed with commas, use format with a c. This
can be useful in graph labeling.
sysuse voter, clear
format pop %15.0fc
scatter pop frac, scheme(s1mono)
90
85 Highlight selected bars in bar chart
To highlight selected bars in a bar chart, use ”separate”
sysuse auto2, clear
keep if price>=10000
91
86 keeporder
Use keeporder to keep and order variables in one line:
capture ssc install keeporder
*old way
sysuse auto2, clear
keep foreign rep78 make
order foreign rep78 make
*new way
sysuse auto2, clear
keeporder foreign rep78 make
92
87 Create your own function/program
In this example, we’ll create our own function (called a “program”) that creates a new
variable that is the sum of two other variables.
capture program drop s_pr
program s_pr, rclass
args x y
*access args as locals
gen sum = ‘x’ + ‘y’, after(‘y’)
end
Run the program. We will enter the arguments length (for x) and turn for y).
sysuse auto2, clear
s_pr length turn
93
88 gsort
Use gsort to sort in descending order:
sysuse auto2, clear
*sort ascending
gsort mpg
*sort descending
gsort -mpg
94
89 Sort descending when using bysort
You can’t directly sort in descending order when using ”bysort”. Here’s a workaround:
sysuse auto2, clear
*doesn’t work:
bys foreign (-turn): gen n=_n
*instead:
gsort foreign -turn
by foreign: gen n = _n
Note, though, that exact ties might be handled differently depending on what you do.
95
90 moreobs
Use moreobs to add additional observations to your data:
ssc install moreobs
sysuse auto2, clear
moreobs 10
sort make
96
91 coefplot
Use coefplot to quickly plot coefficients and confidence intervals.
sysuse auto2, clear
reg price mpg length displacement weight trunk
coefplot, drop(_cons) vertical
97
92 expand
To make n copies of each observation, use expand(n).
sysuse auto2, clear
expand 2
sort make
98
93 nvals
To create a variable with the number of unique values of another variable, use nvals:
sysuse auto2, clear
egen headroom_unique_values = nvals(headroom)
sum headroom_unique_values
99
94 regsave
To save the coefficients and standard errors from your regression, use regsave:
capture ssc install regsave
tempfile coefficients
sysuse auto2, clear
reg price mpg headroom turn length gear_ratio
regsave using ‘coefficients’, replace
100
95 Access certain rows of a variable
If you are creating a new variable and want to assign it the value of another observation
x values apart use [ n+x] or [ n-x].
sysuse auto2, clear
*lagged one observation
gen lag_length = length[_n-1], after(length)
*lead one observation
gen lead_length = length[_n+1], after(lag_length)
101
96 Refer to observations by row number
You can refer to observations by their row number with in
sysuse auto2, clear
*a single observation
replace trunk = 1 in 1
*multiple observations
replace trunk = 0 in 2/5
102
97 colorpalette
Use colorpalette to color your graph:
capture ssc inst mscatter
capture ssc inst palettes
sysuse sp500, clear
foreach i in Zissou1 cividis icefire Blues {
mscatter change close if inrange(change, -30, 30), msymbol(O)
msize(7) sch(s1mono) over(change) colorpalette(‘i’)
}
103