Quick Stata Tips

Download as pdf or txt
Download as pdf or txt
You are on page 1of 103

Quick Stata Tips

Version 1.0

Todd R. Jones1

November 30, 2023

1 Mississippi State University, IZA, and CESifo. toddrjones.com. Twitter page. Bluesky page.
To Ari, Charlotte, Annie, and Ned

2
Preface
This book compiles a number of my “Quick Stata Tips.” I am basing this book on Stata
version 18 (and sometimes 17) on a Mac. However, much—though not all—of the content
in this book applies to (recent) prior versions of Stata. Many—though not all—of the tips
build on concepts introduced in earlier tips.

I have picked up these tips over the years from many different sources. For example,
StackOverflow, StataList, Google, my own coding, and other people. For a book like this,
it’s hard to know how to acknowledge everyone, so I will keep it very general and say
that I would like to thank various people and websites from whom I learned about some
of these tips, as well as others who have developed the packages that I refer to.

I have created a companion Stata .do file with code for most of the tips. You can down-
load it here.

I hope you find it helpful!

3
Contents

1 fre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 mdesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 inlist and inrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Multicursor Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Scroll Buffer Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7 ereplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8 r(table) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
9 ssc hot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
10 opacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
11 Previous line of code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
12 Sort with, but don’t group by . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
13 Open multiple instances of Stata . . . . . . . . . . . . . . . . . . . . . . . . . 19
14 isid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
15 texdoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
16 Check if variable is constant within group . . . . . . . . . . . . . . . . . . . . 22
17 trim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
18 group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
19 tempfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
20 Execute (include) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
21 Control placement of newly-created variables . . . . . . . . . . . . . . . . . . 27
22 substr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
23 browse if . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
24 sysuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
25 bysort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
26 n and N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
27 preserve/restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
28 capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
29 tab1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
30 statastates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
31 compress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
32 quietly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
33 set graphics off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
34 Undocumented and previously documented commands . . . . . . . . . . . 40
35 Pull variables from the ACS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4
36 Graph at county level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
37 Animated maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
38 Animated graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
39 mscatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
40 duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
41 Notify when code is finished running . . . . . . . . . . . . . . . . . . . . . . 47
42 Display loop progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
43 Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
44 Loop over all variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
45 Calculate total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
46 xtile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
47 Remove elements from a local . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
48 Add elements to a macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
49 Save value label to macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
50 Access stored results and other parameters . . . . . . . . . . . . . . . . . . . 56
51 Go between numeric and string with existing variable . . . . . . . . . . . . . 57
52 Go between numeric and string when creating variable . . . . . . . . . . . . 58
53 Version control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
54 asgen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
55 collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
56 seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
57 Add leading zero to number . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
58 Main effects and interactions in regressions . . . . . . . . . . . . . . . . . . . 64
59 keepusing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
60 geodist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
61 geonear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
62 georoute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
63 heatplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
64 Color by a third variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
65 binscatterhist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
66 alluvial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
67 sankey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
68 Scrape webpages with readhtml . . . . . . . . . . . . . . . . . . . . . . . . . 74
69 Choose which variables and observations to load . . . . . . . . . . . . . . . . 75
70 Use datasets from internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
71 Choose which category to omit in a regression . . . . . . . . . . . . . . . . . 77
72 fillin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
73 levelsof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
74 Increment local . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
75 labelbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
76 twoway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
77 rowtotal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
78 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
79 set seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
80 Length of string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5
81 clonevar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
82 Label values based on value labels of another variable . . . . . . . . . . . . . 88
83 Locate .do files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
84 Include commas in large numbers . . . . . . . . . . . . . . . . . . . . . . . . 90
85 Highlight selected bars in bar chart . . . . . . . . . . . . . . . . . . . . . . . . 91
86 keeporder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
87 Create your own function/program . . . . . . . . . . . . . . . . . . . . . . . 93
88 gsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
89 Sort descending when using bysort . . . . . . . . . . . . . . . . . . . . . . . . 95
90 moreobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
91 coefplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
92 expand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
93 nvals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
94 regsave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
95 Access certain rows of a variable . . . . . . . . . . . . . . . . . . . . . . . . . 101
96 Refer to observations by row number . . . . . . . . . . . . . . . . . . . . . . 102
97 colorpalette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6
1 fre
If you want to look at a one-way frequency table of a variable, fre displays both values
and labels at the same time, while tabulate does not. To use fre:
ssc install fre
sysuse auto2, clear
tab foreign
fre foreign

7
2 mdesc
To quickly see how many observations of each variable are missing, use mdesc:
ssc install mdesc
sysuse lifeexp, clear
mdesc

8
3 inlist and inrange

keep if inlist(state, "AL", "AK", "AZ")

is the same as:


keep if state=="AL" | state=="AK" | state=="AZ"

And:
keep if inrange(distance, 10, 91)

is the same as:


keep if distance>=10 & distance<=91

9
4 Multicursor Mode
Stata supports multi cursor mode. With a Mac, hold down Option and drag the cursor.
In Windows, hold down Alt.

You can also place multiple cursors. On a Mac, hold down Command and click. On
Windows, hold down Control and click.

10
5 compare
To quickly compare two variables to see how often the values of the first are higher, lower,
and equal to the second, use compare:
sysuse auto2, clear
compare mpg trunk

11
6 Scroll Buffer Size
To have the ability to scroll up as far as possible in your Results window, run this code:
set scrollbufsize 2048000

12
7 ereplace
It is not possible to replace when using egen commands. You can instead use ereplace.

Here is the old, workaround way to do it:


sysuse auto2, clear
*This next line doesn’t work:
*replace mpg = max(mpg)
*So you instead have to do something like:
egen mpg2 = max(mpg)
drop mpg
rename mpg2 mpg

Here is the easier way:


ssc install ereplace
sysuse auto2, clear
ereplace mpg = max(mpg)

13
8 r(table)
After a regression, you can use r(table) to directly get the 95% confidence interval, t-
statistic, p-value, standard error, beta, etc.
sysuse auto2, clear
reg trunk weight
matrix list r(table)
local weight_lower_95ci = r(table)[5,1]
di "‘weight_lower_95ci’"

14
9 ssc hot
Use ssc hot to see the most popular user-contributed SSC Stata packages:
ssc hot, n(100)

15
10 opacity
To add opacity to your graph, use %X after the color, where 0≤X≤100:
sysuse sp500, clear
replace high = high+80
twoway (hist high, width(20) color(blue%50)) \\\
(hist low, width(20) color(red%50)), \\\
scheme(s1mono) legend(order(1 "Blue" 2 "Red"))

16
11 Previous line of code
To retrieve the previous line(s) of code in the Command Prompt:
fn key + up arrow (Mac)
Page Up (PC)
Control + R (Mac or PC)

And to do the reverse:


fn key + down arrow (Mac)
Page Down (PC)

17
12 Sort with, but don’t group by
When using bysort, say you have a variable you want to sort with, but *not* group by. Put
this variable in parentheses:
sysuse census, clear
*keep the least-populous state within each region, so sort by pop
* within each region:
bys region (pop): keep if _n==1

18
13 Open multiple instances of Stata
On a Mac, to open up multiple instances of Stata, type the following into Terminal:
open -n /Applications/Stata/StataMP.app

(You may have to modify StataMP if you have a different version.

19
14 isid
To check if you data are unique at a certain level, use isid. An error means that it is not
unique, while no error means that it is unique.
sysuse auto2, clear
isid foreign
isid foreign make

20
15 texdoc
estout and outreg2 are good to create .tex files, but they can’t always do what I want. I’ve
switched to the fully customizable texdoc. This version loops over both the right hand
side variables and the different regression specifications.

21
16 Check if variable is constant within group
To check if a variable is constant within group, you can do:
bys group (var): gen a = var[1]==var[_N]
tab a

If all values of a are 1, the variable is constant within group. If some values of a group are
missing, that group is considered not constant.

For a one-line solution:


bysort group: assert var==var[1]

22
17 trim
To get rid of the leading and trailing space of a string variable, use trim:
clear all
input str12 str
"String A "
" String B "
" String C"
end

replace str = trim(str)

23
18 group
To create a variable identifying groups, use group:
sysuse xtline1, clear
egen grp = group(day)
*check that it worked
sort day

24
19 tempfiles
You can use tempfiles to save and then repeatedly load data. They are deleted when you
quit Stata.
sysuse auto2, clear
tempfile a
save ‘a’

Later:
use ‘a’, clear

Or:
merge m:1 id using ‘a’

25
20 Execute (include)
Usually, you cannot access in the Command Prompt the locals and tempfiles you created
in the Do-File Editor, and vice versa. But you can if you run your code with Execute
(include) (as opposed to the default Execute (do)). My coding preference is to be constantly
shifting between the Command Prompt and Do-file Editor, and this tip allows me to do
so.

In addition, I have programmed a keyboard shortcut (command+return) for Execute (in-


clude). In Mac, I did this with:

System Preferences - Keyboard - Shortcuts - App Shortcuts - Stata - eg. Execute (include).

If you do prefer to run your code the traditional way, you can use keyboard shortcuts to
run/Execute (do) your code. On Windows: Control+Shift+D. On Mac: Command+Shift+D.

26
21 Control placement of newly-created variables
When you generate a new variable, use before to place it before another variable, and after
to place it after a variable.
sysuse auto2, clear
*create version of price in units of thousands
gen price_k = price/1000, before(price)

*create lower case version of make and place after make


gen make_lower = lower(make), after(make)

27
22 substr
substr is one way to extract characters from a string. The format is substr(var, X, Y), where
X indexes the first position, and Y is the number of characters you want to extract.
sysuse auto2, clear
*get the first two letters of the string
gen first = substr(make, 1, 2), after(make)

28
23 browse if
To browse only a subset of your data, use browse if :
sysuse auto2, clear
*browse only the cars that begin with "A"
br if substr(make,1,1)=="A"

*go back to browsing all data


br

29
24 sysuse
sysuse loads toy datasets that are already in Stata. sysuse dir lists all such datasets. These
can be useful when, for example, you want to use toy data to ask a question on Stack-
Overflow or StataList.
sysuse dir

*load the citytemp dataset


sysuse citytemp, clear

30
25 bysort
Use bysort.

31
26 n and N
Use n to get the current observation number. Use N to get the maximum observation
number.
sysuse auto2, clear
keep rep78
sort rep78
*obs #:
gen n = _n
*obs # w/i group:
bys rep78: gen group_n = _n
*max obs #:
gen N = _N
*max obs # w/i group:
bys rep78: gen group_N = _N

32
27 preserve/restore
You can save a copy of your data with preserve. You can then change the data and restore
the saved version with restore. (You can also accomplish this goal using tempfiles.)
sysuse auto2, clear
*preserve the data to be used later
preserve
*change the data
keep if _n<5
scatter price mpg
*restore the data
restore

You can cancel the preserve with restore, not. Then you can preserve the data again.

*preserve again
preserve
keep if _n<10
*cancel the prior preserve
restore, not
*preserve again
preserve

33
28 capture
Use capture to allow Stata to keep running the code even if there is an error in the line. In
this example, we want to preserve the data but are not sure if the data has been preserved
before. If it has, then we will get an error. So we want to cancel the preserve. However, if
we cancel the preserve and it hasn’t been preserved, it will also give an error. So we will
use capture to cancel the preserve if it exists; if it doesn’t, the code will continue to run.
sysuse auto2, clear
capture restore, not
preserve

34
29 tab1
To make one-way frequency tables for multiple variables, use tab1:
sysuse auto2, clear
*tabulate (separately) make, price, and mpg:
tab1 make price mpg
*tabulate all variables:
tab1 *

With that said, it’s probably better to instead use fre:


fre make price mpg

35
30 statastates
Use statastates for a crosswalk between two-digit US state abbreviation, state name, and
state FIPS code:
capture ssc install statastates
sysuse census, clear
keep state2
statastates, abbreviation(state2) nogen
replace state_name = strproper(state_name)

36
31 compress
If you have a long string variable, and then shorten its values, use ”compress” to shorten
the variable. This makes the Data Editor easier to look at.
sysuse auto2, clear
replace make = "This is a long string...." in 1
replace make = substr(make, 1, 6)

compress

37
32 quietly
Place quietly in front of a command like reg to suppress the output.
sysuse auto2, clear
quietly reg trunk weight

38
33 set graphics off
Use set graphics off to not display graphs when they are created. This can be useful if
you are making a lot of graphs and want to save them, but don’t want the graphs to
incessantly pop up on your screen.
set graphics off

To again display the graphs:


set graphics on

39
34 Undocumented and previously documented commands
To see undocumented commands:
help undocumented

To see commands that were previously documented, but are not documented anymore:
help prdocumented

40
35 Pull variables from the ACS
Use getcensus to pull variables from the American Community Survey (ACS). To be able
to submit more than 500 requests per day, you need have to get a Census Key at https:
//api.census.gov/data/key_signup.html; you should then activate it.
ssc install getcensus
*replace XYZ w/ your key
global censuskey XYZ
*get population by county
getcensus B01003, year(2015) sample(5) geography(county) clear

41
36 Graph at county level
Use maptile to create choropleth maps at the county level. Other geographies, such as
state, are also supported.

In this example, we’ll color counties according to population, which we pull from the
ACS:
ssc install getcensus
*replace XYZ w/ your key
global censuskey XYZ
*get population by county
getcensus B01003, year(2015) sample(5) geography(county) clear

ssc inst maptile


ssc inst spmap
maptile_install using
"http://files.michaelstepner.com/geo_county2014.zip"
drop county
gen county = substr(g,10,5)
destring county, replace
maptile b01003_001e, geo(county2014) twopt(title(Population by County)
legend(off)) fcolor(Blues)

42
37 Animated maps
Maybe you want to show your variation over time and space. While it’s much easier to
make an animated map with R’s gganimate, you can do it in Stata as in this toy example:
ssc install maptile
ssc install spmap
maptile_install using "http://files.michaelstepner.com/geo_state.zip"
sysuse census, clear
rename state q
rename state2 state
gen year = _n+1900
fillin state year

bys state: replace medage = 0 if medage==.


forvalues i = 1914/1928 {
maptile medage if year==‘i’, geo(state) twopt(title(‘i’) legend(off))
graph export ‘i’.png, replace
}

You then need to stitch them together. On a Mac, you can go to the Terminal, change the
directory to where the files are stored, then:
convert *.png a.gif

43
38 Animated graphs
Using the same general approach as the tip directly above, you can create animated
graphs:
sysuse uslifeexp, clear
forvalues i = 1900/1999 {
scatter le_male le_female if year==‘i’, title(‘i’) scheme(s1mono)
yscale(r(35 80)) xscale(r(40 80)) ylabel(40(10)80)
xlabel(40(10)80)
gr export ‘i’.png, replace
}

In Terminal (on a Mac), navigate to the directory, then:


convert *.png a.gif

44
39 mscatter
Use mscatter to create scatter plots with color gradients.
capture ssc inst mscatter
capture ssc inst palettes
sysuse sp500, clear
mscatter change close if inrange(change, -30, 30), msymbol(O) msize(7)
sch(s1mono) over(change) colorpalette(viridis)

45
40 duplicates
To keep one observation per group:
sysuse auto2, clear
keep if _n<15
bys turn: keep if _n==1

You can also use duplicates. To check if observations are unique within group, you can use
duplicates report. To drop duplicates, you can use duplicates drop. The observations that are
kept will not necessarily be the same as the bysort approach.
sysuse auto2, clear
keep if _n<15
*not unique:
duplicates r turn
*unique:
duplicates r gear_ratio turn
*drop dups
duplicates drop turn, force

46
41 Notify when code is finished running
To have Stata play a sound when it’s done running your code, select the Play sound with
notification option.

And you use the beep to make Stata beep. This could be useful to run when a loop is
finished.
beep

You can also use statapush to have Stata message your phone when it’s done running.
ssc install statapush
help statapush

47
42 Display loop progress
When you’re doing a loop, you can show the progress using the following approach,
which displays a message every 100th iteration:
local iterations = 1000
forvalues i=1/‘iterations’ {
if mod(‘i’/100, 1)==0 di "Iteration ‘i’ of ‘iterations’"
}

48
43 Timers
To time how long your code takes, you can use etime:
capture ssc install etime
etime, start
forvalues i=1/1000000 {
quietly di "‘i’"
}
etime

Or, you can use timer:


timer clear
timer on 1
forvalues i=1/1000000 {
di "‘i’"
}
timer off 1
timer list

Or, you can use rmsg. (Turn it off with set rmsg off.)
set rmsg on
*to do this permanently
*set rmsg on, permanently
forvalues i=1/1000000 {
quietly di "‘i’"
}

49
44 Loop over all variables
To loop over all variables, use varlist all:
sysuse census, clear
foreach var of varlist _all {
rename ‘var’ ‘var’_42
}

50
45 Calculate total
If you want to calculate the total of a variable, use egen total. Don’t use gen sum as this
calculates the running/cumulative total.
sysuse auto2, clear
gen one = 1
egen total = total(one)

51
46 xtile
Use xtile to create a new variable that groups the values of another variable into quantile
bins.
sysuse auto2, clear
keep turn
*quartiles
xtile turn_quartile = turn, nq(4)
*deciles
xtile turn_decile = turn, nq(10)

52
47 Remove elements from a local
Here’s how to create a local with all variables except one:
sysuse auto2, clear
ds
local vars ‘r(varlist)’
di "‘vars’"
local remove_vars "make"
local vars_new: list vars - remove_vars
di "‘vars_new’"

53
48 Add elements to a macro
Here’s how to add elements to a local:
local loc a b c
di "‘loc’"
*add d & e
local loc ‘loc’ d e
di "‘loc’"

Here’s how to add elements to a global:


global glo v w x
di "$glo"
*add y & z
global glo $glo y z
di "$glo"

54
49 Save value label to macro
To save a value label to a local, use this format: local loc: label (var) X, where X is the value.
sysuse auto2, clear
fre foreign
local foreign_0: label (foreign) 0
local foreign_1: label (foreign) 1
di "‘foreign_0’"
di "‘foreign_1’"
scatter price displacement if foreign==0, title(‘foreign_0’)

55
50 Access stored results and other parameters
You can use return list, all and ereturn list, all (and sreturn list, all) to show all stored results.
creturn list shows other parameters.
sysuse auto2, clear
reg weight gear_ratio
ereturn list, all
return list, all
matrix list r(table)
creturn list
di "‘c(pi)’"

56
51 Go between numeric and string with existing variable
destring converts from string to numeric, and tostring converts from numeric to string:
sysuse auto2, clear
tostring turn, replace
destring turn, replace

57
52 Go between numeric and string when creating variable
real() converts from string to numeric, and string() converts from numeric to string:
sysuse pop2000, clear
keep if _n>2
gen age = real(substr(agest, 1, 2)), after(agestr)
gen age_string = string(age), after(age)

58
53 Version control
A rudimentary version of version control: at the beginning of your file, create a global for
the file path of your output, which you can then change every day:
global output "/Users/me/Research/projectName/output/12-25-2023/"
...
graph export "${output}/a.png"

59
54 asgen
Use asgen to create a weighted average:
capture ssc install asgen
sysuse census, clear
bys region: asgen weighted_medage = medage, weight(pop)

60
55 collapse
Use collapse to perform operations within groups to create new variables. Here, we will
compute the mean of medage within region, as well as the number of observations within
region.
sysuse census, clear
gen N = 1
collapse (mean) medage (sum) N, by(region)

We can also weight. This gives the same numbers as the asgen example above.
sysuse census, clear
collapse (mean) medage [aweight=pop], by(region)

61
56 seq
Use seq to repeat sequences of numbers:
ssc install seq
sysuse auto2, clear
*repeat 1 2 3
seq rep1, from(1) to(3)
*repeat 1 1 2 2 3 3
seq rep2, from(1) to(3) block(2)

62
57 Add leading zero to number
here’s how to add a leading zero to a number, which is useful when working with state
FIPS codes. Here, we will add a zero to a one-digit numbers:
clear all
set obs 15
gen fips = _n
tostring fips, gen(fips2)
replace fips2 = "0" + fips2 if strlen(fips2)==1

63
58 Main effects and interactions in regressions
Here’s how to think about main effects and interactions in regressions.

By default, main effects are continuous. The i. prefix makes them categorical.

By default, variables in interaction terms are categorical. The c. prefix makes them con-
tinuous.

Assume the interaction term has two variables. To include only the interaction term, use
# between the variables.

To include both the main effects and the interaction term, use ## between the variables.

sysuse auto2, clear


gen b = _n<10
gen b_weight = b*weight

*equivalent: (note the i. is redundant b/c b is binary):


reg price i.b weight b_weight
reg price i.b weight b#c.weight
reg price b##c.weight

64
59 keepusing
When merging using data to the main data, use the ”keepusing” option to keep only
select variables from the using data:
webuse nhanes2, clear
tempfile nh
save ‘nh’

use sampl houssiz using ‘nh’, clear


merge 1:1 sampl using ‘nh’, keepusing(height weight)

65
60 geodist
geodist calculates the “as the crow flies” distance between points:
ssc install geodist
clear
input double lat lon
34.043026 -118.26694
39.74915 -105.00740
end
geodist 42.366570 -71.06186 lat lon, gen(dist) miles

66
61 geonear
Use geonear to find the nearest point in B to each point in A:
ssc inst geonear
clear
set ob 20
set se 1
g n2=_n
g la2=39+5*rt(5)
g lo2=-99+9*rt(5)
tempfile a
save ‘a’
ren n2 n
g la=la2+3*rt(3)
g lo=lo2+4*rt(4)
drop *2
geonear n la lo using ‘a’, n(n2 la2 lo2)

67
62 georoute
Use georoute to calculate travel times between addresses or coordinates. You can choose
car, bike, walk, and public transit.

Note you have to register an account to use the here .com API, and the free version limits
the number of requests.

68
63 heatplot
use heatplot to make a heatmap:
capture ssc install heatplot
webuse nhanes2, clear
heatplot weight height

69
64 Color by a third variable
Use colorvar to color a scatterplot by a third variable:
sysuse auto2, clear
scatter weight length, colorvar(turn)

70
65 binscatterhist
Use binscatterhist to show a binned scatterplot along with histograms of the variables.
capture ssc install binscatterhist
sysuse auto2, clear
binscatterhist weight length, hist(weight length) ymin(1100)
xhistbarheight(30) yhistbarheight(13)

71
66 alluvial
Use alluvial to make alluvial plots in Stata:
ssc install alluvial
*example from package tutorial
sysuse nlsw88.dta, clear
alluvial race married collgrad smsa union

72
67 sankey
Use sankey to make Sankey plots:
ssc install sankey
*example from package tutorial
use
"https://github.com/asjadnaqvi/stata-sankey/blob/main/data/sankey2.dta?raw=true"
clear
sankey value, from(source) to(destination) by(layer) noval showtot
palette(CET C6) laba(0) labpos(3) labg(-1) offset(10)

73
68 Scrape webpages with readhtml
Use readhtml to scrape (parts of certain) web pages. Below is the percent of a state that’s
covered by forest.
capture net install readhtml, from(https://ssc.wisc.edu/sscc/stata/)
capture ssc install statastates
capture ssc install maptile
capture ssc install spmap
capture maptile_install using
"http://files.michaelstepner.com/geo_state.zip"

readhtmltable https://en.wikipedia.org/wiki/Forest cover by state and territory in the United States,


varnames

gen st=substr(S,7,30)
drop if inlist(_n,3,4,16,18,24,40,57)
gen forest=substr(Percent_forest_2,1,length(Percent_forest_2)-2)
destring forest, replace
keep st forest
statastates, name(st)
rename state_abbrev state
maptile forest, geo(state) fcolor(Greens) twopt(title("Percent
Forest"))

74
69 Choose which variables and observations to load
When loading data with use, you can load only select vars with a varlist, use in to select
observations by observation number, and/or use if to select observations with logic:
sysuse auto2, clear
tempfile a
save ‘a’
use make turn using ‘a’ in 2/9
use ‘a’ if length>200

75
70 Use datasets from internet
You can load toy datasets from the internet. The links from help dta manuals give you a lot
to choose from.
help dta_manuals

help q_base

use https://www.stata-press.com/data/r18/apple.dta

76
71 Choose which category to omit in a regression
To choose which category to omit in a regression, use bX, where X is the value you wish
to omit:
sysuse auto2, clear
*view values of rep78
fre rep78
*default:
reg mpg i.rep78
*omit category 3 (average)
reg mpg ib3.rep78

77
72 fillin
Use fillin to fill in dataset so that all combinations of the variables are present:
clear all
input year str1 state value
1 "A" 6
2 "A" 3
4 "A" 5
3 "B" 1
end
fillin year state

78
73 levelsof
To get all values of a variable, use levelsof (and the local() option to put them into a local)
sysuse auto2, clear
levelsof mpg
levelsof trunk
*put it into a local
levelsof trunk, local(trunklevels)
di "‘trunklevels’"

You can do this and then loop over the values:


sysuse auto2, clear
levelsof rep78, missing local(rep78_levels)
foreach i of local rep78_levels {
sum price
}

For strings:
levelsof make
levelsof make, clean

79
74 Increment local
Use ++ to increment a local:
local b = 1
local b = ‘b’ + 1
di "‘b’"
*Equivalent:
local a = 1
local ++a
di "‘a’"

80
75 labelbook
To show all value labels, use labelbook:
sysuse auto2, clear
labelbook

81
76 twoway
To graph multiple things at once, use twoway:
sysuse auto2, clear
twoway scatter mpg length || ///
scatter mpg displacement, ///
scheme(s1color) legend(label(1 "Length") label(2 "Displacement"))

82
77 rowtotal
Use rowtotal to sum across columns. Note that rowtotal treats missings as 0s.
sysuse auto2, clear
gen sum1 = price + mpg + rep78
egen sum2 = rowtotal(price mpg rep78)
egen sum3 = rowtotal(price-rep78)

83
78 Random variables
To create a uniform random variable, use runiform().
sysuse auto, clear
gen r=runiform()

There are other functions, such as rexponential. To see them, type in help runiform and then
click on the View complete PDF manual entry link.

84
79 set seed
To ensure reproducibility, use set seed. This makes it so that the generate random numbers
will be the same each time you run it.
sysuse auto, clear
set seed 42
gen r=runiform()

*this will produce exactly the same numbers:


sysuse auto, clear
set seed 42
gen r=runiform()

85
80 Length of string
To count the number of words in a string, use word count. To count the number of charac-
ters in a string, use strlen.
local loc "How long is this?"
local loc_words : word count ‘loc’
di "‘loc_words’"

local loc_chars = strlen("‘loc’")


di "‘loc_chars’"

86
81 clonevar
To make an exact copy of another variable, use ”clonevar”:
sysuse auto2, clear
*this does not create an exact copy:
gen foreign2 = foreign
*this does:
clonevar foreign3 = foreign

87
82 Label values based on value labels of another variable
To label the values of var 1 using the value labels of var 2, use ”describe” on var 2 to find
the name of the value label, then apply to var 1.
sysuse auto2, clear
set seed 1
gen new = round(runiform()*4+1)
describe rep78
label values new repair
ssc install fre
fre new

88
83 Locate .do files
Find the .do files in a given directory on your computer that contain a particular word or
phrase:
ssc install find
ssc install rcd
rcd "/Users/Todd/Google Drive": find *.do, match(sysuse auto2) show

89
84 Include commas in large numbers
To make it so that large numbers are displayed with commas, use format with a c. This
can be useful in graph labeling.
sysuse voter, clear
format pop %15.0fc
scatter pop frac, scheme(s1mono)

90
85 Highlight selected bars in bar chart
To highlight selected bars in a bar chart, use ”separate”
sysuse auto2, clear
keep if price>=10000

separate price, by(make=="Linc. Continental")

graph hbar (asis) price0 price1, nofill over(make, sort(price) desc)


legend(off) scheme(s1color)

91
86 keeporder
Use keeporder to keep and order variables in one line:
capture ssc install keeporder

*old way
sysuse auto2, clear
keep foreign rep78 make
order foreign rep78 make

*new way
sysuse auto2, clear
keeporder foreign rep78 make

92
87 Create your own function/program
In this example, we’ll create our own function (called a “program”) that creates a new
variable that is the sum of two other variables.
capture program drop s_pr
program s_pr, rclass
args x y
*access args as locals
gen sum = ‘x’ + ‘y’, after(‘y’)
end

Run the program. We will enter the arguments length (for x) and turn for y).
sysuse auto2, clear
s_pr length turn

93
88 gsort
Use gsort to sort in descending order:
sysuse auto2, clear
*sort ascending
gsort mpg
*sort descending
gsort -mpg

94
89 Sort descending when using bysort
You can’t directly sort in descending order when using ”bysort”. Here’s a workaround:
sysuse auto2, clear
*doesn’t work:
bys foreign (-turn): gen n=_n
*instead:
gsort foreign -turn
by foreign: gen n = _n

If the sorting variable is non-string, you can do:


sysuse auto2, clear
gen turn_rev = -turn
bys foreign (turn_rev): gen n=_n
drop turn_rev

Note, though, that exact ties might be handled differently depending on what you do.

95
90 moreobs
Use moreobs to add additional observations to your data:
ssc install moreobs
sysuse auto2, clear
moreobs 10
sort make

96
91 coefplot
Use coefplot to quickly plot coefficients and confidence intervals.
sysuse auto2, clear
reg price mpg length displacement weight trunk
coefplot, drop(_cons) vertical

97
92 expand
To make n copies of each observation, use expand(n).
sysuse auto2, clear
expand 2
sort make

98
93 nvals
To create a variable with the number of unique values of another variable, use nvals:
sysuse auto2, clear
egen headroom_unique_values = nvals(headroom)
sum headroom_unique_values

99
94 regsave
To save the coefficients and standard errors from your regression, use regsave:
capture ssc install regsave
tempfile coefficients
sysuse auto2, clear
reg price mpg headroom turn length gear_ratio
regsave using ‘coefficients’, replace

use ‘coefficients’, clear

100
95 Access certain rows of a variable
If you are creating a new variable and want to assign it the value of another observation
x values apart use [ n+x] or [ n-x].
sysuse auto2, clear
*lagged one observation
gen lag_length = length[_n-1], after(length)
*lead one observation
gen lead_length = length[_n+1], after(lag_length)

101
96 Refer to observations by row number
You can refer to observations by their row number with in
sysuse auto2, clear
*a single observation
replace trunk = 1 in 1
*multiple observations
replace trunk = 0 in 2/5

102
97 colorpalette
Use colorpalette to color your graph:
capture ssc inst mscatter
capture ssc inst palettes
sysuse sp500, clear
foreach i in Zissou1 cividis icefire Blues {
mscatter change close if inrange(change, -30, 30), msymbol(O)
msize(7) sch(s1mono) over(change) colorpalette(‘i’)
}

103

You might also like