PANEL

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Stata Brush-up course September 2022, BSE

Elia Benveniste & Alejandro Rábano

Day 7: Time Series and Panel Data


Motivation for today’s content:
Much empirical research is done using cross-sectional data, i.e., samples of certain populations at one
point in time. But there are other formats as well, that repeatedly sample one or more observations over
several time periods. These are called time-series data (if we have repeated information on one
observation) and panel data (if we have repeated information on a whole cross-section.

To be able to work with them, one has to specify certain settings in Stata, so that Stata knows what type
of data it faces. That’s what we are going to learn today.

Part I: Time series

A time series is a sequence of data-points which are ordered in time. Examples of time series are the
quarterly GDP of a country, the monthly unemployment rate, or the daily closing value of the Dow Jones
index.

Time series are distinct from other forms of data because they have a natural temporal ordering. For
example, the graph below shows a time series of quarterly frequency: the unemployment rate in the USA
from the 1st quarter of 1970 to the 2nd quarter of 2010.

(1) tsset - Declare data to be time-series data


Stata does not automatically recognize that a given dataset is a time series.

You can declare the current data set to be a time series by using the command tsset. Once your data
set has been tsset, you can use Stata’s time series operators and functions.
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano

tsset timevar [, options]

Important: unitoption specifies the units of the time variable. Possible options are: daily, weekly,
monthly, quarterly, halfyearly, or yearly.

Requirements:
1. To use tsset you need one variable in the data set that represents the date or time dimension.
2. This date variable must be in the Stata-date format.

Example
Open the dataset emp_ quarterly.dta

The data set contains the number of employed, the number of unemployed, and the GDP of the USA on a
quarterly basis from the 1st of 1970 to the 2nd quarter of 2010. The variable date is the date variable. The
frequency of the data is quarterly. Therefore, in order to declare the data to be time series data, we have
to use the command

tsset date, quarterly

Stata now recognizes the data as time series data with quarterly frequency. As mentioned above, the
advantage is that we can now use Stata’s powerful time-series operators and functions to analyze our
data.

(2) The Stata date format: “elapsed time”


The tsset command expects the date variable to be in the Stata-date format. Stata stores dates as the
number of time units that have elapsed since January 1, 1960. The following table gives some examples
of how dates are represented in the “elapsed-time”-format.

frequency elapsed time


yearly data
1975 15
1986 26
2001 41
quarterly data
1st quarter 1975 60
2nd quarter of 1986 105
3rd quarter of 2001 166
monthly data
January 1975 180
February 1986 313
March 2001 494
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano

(3) Construct a Stata date variable from numerical components

In most cases your dataset will either come without a date variable or the date variable will not be in the
Stata “elapsed time”-format.
→ Often you will have to construct a Stata date variable from two or more numerical components.

Example:
Open the dataset emp_quarterly2.dta. This dataset is equivalent to the one we saw before, however, here
the date of an observation is given by two numbers: the variables year and quarter. If you want to declare
this dataset as time series data with the command tsset, you first need to construct a Stata date variable.
In this case this is done with
generate newdate = yq(year, quarter)

You can now declare the dataset to be of the time series type with
tsset newdate, quarterly

Possibilities to construct a Stata date variable from two or more numerical components:

generate date=mdy(month, day, year) → daily data


generate date=yw(year, week) → weekly data
generate date=ym(year, month) → monthly data
generate date=yq(year, quarter) → quarterly data
generate date=yh(year, halfyear) → half-yearly data

(4) Construct a Stata-date variable from scratch

Imagine you have a time series dataset without any date variables at all. The only thing you know is the
frequency of the time series and time point when the time series ends or starts. In order to use the tsset-
command on this dataset you need to create a Stata-date variable first.

Example:
Open the dataset emp_quarterly3.dta. Again, this is the same dataset as before, but this time there is no
date variable at all. Suppose we know that the first observation is for the first quarter of 1970 and that the
frequency of the time series is quarterly. In order to construct a Stata-date variable date we can write
generate date = yq(1970,1) + _n-1

Here _n refers to the number of the observation in the dataset. Therefore, the code implies that the first
observation in the dataset is declared to be the 1st quarter of 1970, the second observation is the 2nd
quarter of 1970, and so on.
Now you can do again
tsset date, quarterly
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano

(5) Convert a string to a Stata-date variable

It also happens often that the date variable in the dataset comes as a string variable. (A string variable is
simply a variable containing anything other than just numbers.) Stata allows you to convert this variable
into a Stata-date variable.

Example:
Open the dataset emp_quarterly4.dta. This dataset is again the one we have already seen above, however,
now the date variable datestr comes as a string. The format of this string is “quarter year” that is, the
number of the quarter is followed by a blank and then the year. For example, the 2nd quarter of 1990 would
be “2 1990”. The following command converts this string into a Stata-date variable
gen date=quarterly(datestr,"QY")

Here “QY” describes the format of the string. If in the variable datestr year and quarter would be reversed,
that is, if the 2nd quarter of 1990 would be represented by “1990 2”, then the command would be
gen date=quarterly(datestr,"YQ")

Now it is possible to use the tsset-command


tsset date, quarterly

The other commands to convert dates that come as strings into Stata-date variables are
generate date=date(datestr,"MDY") → daily data
generate date=weekly(datestr,"WY") → weekly data
generate date=monthly(datestr,"MY") → monthly data
generate date=quarterly(datestr,"QY") → quarterly data
generate date=halfyearly(datestr,"HY") → half-yearly data
generate date=yearly(datestr,"Y") → yearly data

(6) Time series operators


Once the dataset has been declared to be time series with tsset, Stata time series operators can be used.
You can access the lags, leads, and differences of a variable by using

Lag l.variable
Lead f.variable
Difference d.variable
2 lags l2.variable
2 leads f2.variable
2nd difference d2.variable

Example:
Open the dataset emp_quarterly.dta. You can now create variables that hold the number of employed one
and two quarters ago.
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano

tsset date, quarterly


gen l1employed =l1.employed
gen l2employed =l2.employed

Now let’s compare the variables using the data browser.


browse employed l1employed l2employed

You can use the time series operators in most Stata-commands. For example, you can run a regression of
the number of unemployed in one quarter on the number of unemployed the quarter before.
reg unemployed l.unemployed

(7) Example of a time series function: calculate autocorrelations


corrgram varname [if] [in] [, corrgram_options]
→Calculates the autocorrelations of a variable for a given number of lags. (An autocorrelation is the
correlation of a variable with its previous values.)

ac varname [if] [in] [, ac_options]


→ Visualizes the autocorrelations in a graph.

Example:
corrgram unemployed, lags(10)
calculates the autocorrelations of the variable unemployed for up to 10 lags. It is not surprising to see
that unemployment is highly autocorrelated. The correlation coefficient is very high at one lag and is
then gradually declining to about 0.08 at 10 lags.

The autocorrelations can also be visualized by using the ac-command.


ac unemployed, lags(10)

Part II: Panel data

A panel is a time series that also has a cross-sectional dimension. That is, the same n individuals/ firms/
countries are observed at several time points t. Panel data is the star among the data sets, because it
allows you to do a whole series of evaluations and estimation techniques that you wouldn’t be able to use
with cross-sections or time series.

Panel data can be saved in two different formats, long and wide.

a) long = the various observations per individual (or country) are coded as different observations (different
rows, smaller number of variables)
b) wide = all the information per individual is coded in one observation with series of variables indicating
the change in one particular aspect over time (one row, a lot of variables)
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano

GDP and population for a panel of countries in the long format …

… and the same data in the wide format

(1) The reshape-command


With Stata, it is more convenient to have data in the long format. Unfortunately, data very often comes in
the wide format. The reshape-command allows you to convert a dataset from the wide format to the
long format and vice versa.
reshape long stubnames, i(varlist) [options]
reshape wide stubnames, i(varlist) [options]

To convert from wide to long the syntax is


reshape long variables, i(identifier) j(date)

Here variables are the variables that you want to convert, identifier is the variable that identifies
individuals/firms/countries etc. in the wide format, and date is the name of the new variable that holds
the date in the long format.

For example, open the panel of countries above in the wide format
use countries_small_wide.dta
Stata Brush-up course September 2022, BSE
Elia Benveniste & Alejandro Rábano

Now convert this dataset to the long format


reshape long gdp pop, i(country) j(year)

If you want to convert the dataset from the long format to the wide format you can use
reshape wide gdp pop, i(country) j(year)

(2) xtset - Declare data to be panel data

In order to use Stata’s panel data commands you need to declare the data to be panel data by using the
xtset command. For this the data needs to be in the long format.

xtset panelvar datevar, frequency

panelvar is the variable that identifies the individual/firms/country etc.


panelvar cannot be a string variable.
datevar is the variable that refers to the date.

Let’s try
xtset country year, yearly

In our example dataset countries_small_wide.dta the panelvar is country. However, country is a string
variable. We can easily create a new numerical variable country_num out of the string variable country
using the encode command.
encode country, generate(country_num)

Then we can use xtset


xtset country_num year, yearly

Now Stata recognizes that the dataset is of the panel type and you use Stata’s powerful xt-commands
(commands for panel data). To learn more about these commands type help xt.

You might also like