PANEL
PANEL
PANEL
To be able to work with them, one has to specify certain settings in Stata, so that Stata knows what type
of data it faces. That’s what we are going to learn today.
A time series is a sequence of data-points which are ordered in time. Examples of time series are the
quarterly GDP of a country, the monthly unemployment rate, or the daily closing value of the Dow Jones
index.
Time series are distinct from other forms of data because they have a natural temporal ordering. For
example, the graph below shows a time series of quarterly frequency: the unemployment rate in the USA
from the 1st quarter of 1970 to the 2nd quarter of 2010.
You can declare the current data set to be a time series by using the command tsset. Once your data
set has been tsset, you can use Stata’s time series operators and functions.
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano
Important: unitoption specifies the units of the time variable. Possible options are: daily, weekly,
monthly, quarterly, halfyearly, or yearly.
Requirements:
1. To use tsset you need one variable in the data set that represents the date or time dimension.
2. This date variable must be in the Stata-date format.
Example
Open the dataset emp_ quarterly.dta
The data set contains the number of employed, the number of unemployed, and the GDP of the USA on a
quarterly basis from the 1st of 1970 to the 2nd quarter of 2010. The variable date is the date variable. The
frequency of the data is quarterly. Therefore, in order to declare the data to be time series data, we have
to use the command
Stata now recognizes the data as time series data with quarterly frequency. As mentioned above, the
advantage is that we can now use Stata’s powerful time-series operators and functions to analyze our
data.
In most cases your dataset will either come without a date variable or the date variable will not be in the
Stata “elapsed time”-format.
→ Often you will have to construct a Stata date variable from two or more numerical components.
Example:
Open the dataset emp_quarterly2.dta. This dataset is equivalent to the one we saw before, however, here
the date of an observation is given by two numbers: the variables year and quarter. If you want to declare
this dataset as time series data with the command tsset, you first need to construct a Stata date variable.
In this case this is done with
generate newdate = yq(year, quarter)
You can now declare the dataset to be of the time series type with
tsset newdate, quarterly
Possibilities to construct a Stata date variable from two or more numerical components:
Imagine you have a time series dataset without any date variables at all. The only thing you know is the
frequency of the time series and time point when the time series ends or starts. In order to use the tsset-
command on this dataset you need to create a Stata-date variable first.
Example:
Open the dataset emp_quarterly3.dta. Again, this is the same dataset as before, but this time there is no
date variable at all. Suppose we know that the first observation is for the first quarter of 1970 and that the
frequency of the time series is quarterly. In order to construct a Stata-date variable date we can write
generate date = yq(1970,1) + _n-1
Here _n refers to the number of the observation in the dataset. Therefore, the code implies that the first
observation in the dataset is declared to be the 1st quarter of 1970, the second observation is the 2nd
quarter of 1970, and so on.
Now you can do again
tsset date, quarterly
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano
It also happens often that the date variable in the dataset comes as a string variable. (A string variable is
simply a variable containing anything other than just numbers.) Stata allows you to convert this variable
into a Stata-date variable.
Example:
Open the dataset emp_quarterly4.dta. This dataset is again the one we have already seen above, however,
now the date variable datestr comes as a string. The format of this string is “quarter year” that is, the
number of the quarter is followed by a blank and then the year. For example, the 2nd quarter of 1990 would
be “2 1990”. The following command converts this string into a Stata-date variable
gen date=quarterly(datestr,"QY")
Here “QY” describes the format of the string. If in the variable datestr year and quarter would be reversed,
that is, if the 2nd quarter of 1990 would be represented by “1990 2”, then the command would be
gen date=quarterly(datestr,"YQ")
The other commands to convert dates that come as strings into Stata-date variables are
generate date=date(datestr,"MDY") → daily data
generate date=weekly(datestr,"WY") → weekly data
generate date=monthly(datestr,"MY") → monthly data
generate date=quarterly(datestr,"QY") → quarterly data
generate date=halfyearly(datestr,"HY") → half-yearly data
generate date=yearly(datestr,"Y") → yearly data
Lag l.variable
Lead f.variable
Difference d.variable
2 lags l2.variable
2 leads f2.variable
2nd difference d2.variable
Example:
Open the dataset emp_quarterly.dta. You can now create variables that hold the number of employed one
and two quarters ago.
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano
You can use the time series operators in most Stata-commands. For example, you can run a regression of
the number of unemployed in one quarter on the number of unemployed the quarter before.
reg unemployed l.unemployed
Example:
corrgram unemployed, lags(10)
calculates the autocorrelations of the variable unemployed for up to 10 lags. It is not surprising to see
that unemployment is highly autocorrelated. The correlation coefficient is very high at one lag and is
then gradually declining to about 0.08 at 10 lags.
A panel is a time series that also has a cross-sectional dimension. That is, the same n individuals/ firms/
countries are observed at several time points t. Panel data is the star among the data sets, because it
allows you to do a whole series of evaluations and estimation techniques that you wouldn’t be able to use
with cross-sections or time series.
Panel data can be saved in two different formats, long and wide.
a) long = the various observations per individual (or country) are coded as different observations (different
rows, smaller number of variables)
b) wide = all the information per individual is coded in one observation with series of variables indicating
the change in one particular aspect over time (one row, a lot of variables)
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano
Here variables are the variables that you want to convert, identifier is the variable that identifies
individuals/firms/countries etc. in the wide format, and date is the name of the new variable that holds
the date in the long format.
For example, open the panel of countries above in the wide format
use countries_small_wide.dta
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano
If you want to convert the dataset from the long format to the wide format you can use
reshape wide gdp pop, i(country) j(year)
In order to use Stata’s panel data commands you need to declare the data to be panel data by using the
xtset command. For this the data needs to be in the long format.
Let’s try
xtset country year, yearly
In our example dataset countries_small_wide.dta the panelvar is country. However, country is a string
variable. We can easily create a new numerical variable country_num out of the string variable country
using the encode command.
encode country, generate(country_num)
Now Stata recognizes that the dataset is of the panel type and you use Stata’s powerful xt-commands
(commands for panel data). To learn more about these commands type help xt.