Working Paper Series
Department of Economics
University of Verona
Estimation of Unit Values in Household Expenditure
Surveys without Quantity Information
Martina Menon, Federico Perali, Nicola Tommasi
WP Number: 19
ISSN:
2036-2919 (paper),
October 2016
2036-4679 (online)
The Stata Journal (yyyy)
vv, Number ii, pp. 1–19
Estimation of Unit Values in Household
Expenditure Surveys without Quantity
Information
Martina Menon
University of Verona
Department of Economics
Verona, Italy
[email protected]
Federico Perali
University of Verona
Department of Economics and CHILD
Verona, Italy
[email protected]
Nicola Tommasi
University of Verona
Interdepartmental Centre of Economic Documentation (CIDE)
Verona, Italy
[email protected]
Abstract. This paper presents the Stata command pseudounit that estimates
Pseudo Unit Values in cross-sections of household expenditure surveys without
quantity information. Household surveys traditionally record expenditure information only. The lack of information about quantities purchased precludes the
possibility of deriving household specific unit values. We use a theoretical result
developed by Lewbel (1989) to construct “pseudo” unit values by first reproducing
cross-sectional price variation, and then adding this variability to the aggregate
price indexes published by national statistical institutes. We illustrate the method
with an example that uses a time-series of cross-sections of Italian household budgets.
Keywords: st0001, Unit values, Cross-section prices, Demand analysis, pseudounit.
1
Introduction
This paper presents the theory used to implement a Stata command that estimates unit
values in cross-sections of household expenditure surveys without quantity information
and describes how the command should be used. Empirical works on demand analysis
generally rely on the assumption of price invariance across households, supported by
the hypothesis that in cross-sectional data there are neither time nor spatial variations
in prices. According to this assumption each family pays the same prices for homogeneous goods. Micro-data with this characteristic allow researchers to estimate only
Engel curves without accounting for price effects, which are crucial for both behavioural
and welfare applications. Slesnick (1998:150) remarks that “the absence of price information in the surveys creates special problems for the measurement of social welfare,
inequality and poverty. ... Most empirical work links micro data with national price
series on different types of goods so cross sectional variation is ignored. Access to more
disaggregate information on prices will enhance our ability to measure social welfare,
c yyyy StataCorp LP
st0001
2
Estimation of Unit Values
although it remains to be seen whether fundamental conclusions concerning distributional issues will be affected.” In empirical works such limitation is usually by-passed
by analysing time-series of cross-sections where price information comes from aggregate
time series data. Plausible estimates of price effects require a series of cross-sections
that is long enough and, if possible, aggregate price indexes that vary by month and
location, usually by region or province.
Household budget surveys of both developed and developing countries can be classified into two broad categories in increasing order of frequency of occurrence: 1) surveys
of expenditure and quantities purchased, and 2) surveys of expenditure data only. In the
first case, where quantities and expenditure are both observed, cross-sectional prices are
obtained as implicit prices, dividing expenditure by quantities, and are more properly
referred to as “unit values.” When dealing with these surveys it is important to remember that a proper use of unit values in econometric analyses must take into account
problems arising from the fact that unit values provide useful information about prices,
but differ from market prices in many respects. The ratio between expenditure and
quantities bought embed information about the choice of quality (Deaton 1987, 1988a,
1988b, 1990, 1998, Perali 2003). The level of the unit value of a composite good depends
on the relative share of high-quality items and the composition of the aggregate good.
Unit values can be highly variable also for supposedly homogeneous goods because the
market offers many different grades and types.
On the other hand, when dealing with surveys that report only expenditure information, aggregate national price indexes are usually merged with household expenditure
to obtain estimates of price elasticities. Unfortunately, this approach requires a long
time-series of cross-sectional data to estimate a demand system with sufficient price
variation and relies on very restrictive assumptions (Frisch 1959), which often turn out
to be rejected in empirical applications. Aggregate price indexes are in general highly
correlated, may suffer from endogeneity problems (Lecocq and Robin 2015) and the
estimated elasticities are often not coherent with the theory (Atella, Menon and Perali
2003, Coondoo, Majumder and Ray 2004, Dagsvik and Brubakk 1998, Lahatte, Laisnay and Preston 1998). For these reasons, surveys gathering exclusively expenditure
data, such as the Italian household budget survey conducted by the National Statistical
Institute (ISTAT) used in our example, and the majority of existing household budget
surveys, have limited applicability in modern demand and welfare analysis. It is then
important to devise an appropriate procedure to compute “pseudo” unit values using
the information traditionally available in expenditure surveys, such as budget shares
and demographic characteristics, which help reproducing the distribution of the unit
value variability as closely as possible. The theoretical background for this undertaking
is provided in a study by Lewbel (1989).
The remainder of the paper is organized as follows. Section 2 presents the theory
and the method used to derive consumer price indexes and pseudo unit values when
the main objective is to implement demand analysis of household budget data without
quantity information. Section 3 provides the syntax and options of the pseudounit
Stata command. Section 4 illustrates the application.
Menon, Perali and Tommasi
2
3
The Estimation of Unit Values in Cross-Section Analysis of Household Budget Surveys
We introduce a method that recovers unit values when only expenditure information is
available using knowledge about aggregate price indexes available from national statistics.
First, as illustrated in Section 2.1, we need to collect the consumer price indexes
available from official statistics and to associate them with each household in the survey.
Then, in order to improve the precision of the estimated price elasticities as shown in
Atella, Menon and Perali (2003), we reproduce as best as we can the price variation
of actual unit values, which could be obtained as the ratio between expenditure and
quantities if quantity information were available in the survey. The estimation of pseudo
unit values is described in Section 2.2 and 2.3.
2.1
Consumer Price Indexes
Eurostat adopts the Classification Of Individual COnsumption by Purpose, abbreviated
as COICOP, that is a nomenclature developed by the United Nations Statistics Division
to classify and analyze individual consumption expenditures incurred by households,
non-profit institutions serving households, and general government according to their
purpose.1 National statistical institutes traditionally publish consumer price indexes
per each COICOP category on a monthly basis, which are collected at provincial level.
ij
be the consumer price index for the j-th of the i-th COICOP group with
Let Prm
i = 1, ..., n collected by national statistical institutes on a monthly basis m = 1, ..., M
per each territorial level r = 1, ..., R, such as a province or a region. These price indexes
are the same for all households living in the same region and interviewed in the same
month. If detailed price information disaggregated by territorial level or time is not
available, then we only have P i that is the same for all households. With this highly
limited price information, demand analysis cannot be implemented because the data
matrix is not invertible and pseudo unit values must be estimated.
The next task is to match the monthly price index specific to each territorial unit
ij
Prm
with all households living in province or region r and interviewed at month m. Then
i
ij
for i = 1, ..., K groups corresponding to
is aggregated into a price index Prm
each Prm
the goods selected for the empirical demand analysis. The aggregation uses Laspeyres
indexes
i
Prm
=
ni
X
j=1
ij
Prm
wij ,
(1)
1. The COICOP top level aggregation encompasses 12 categories: food and non-alcoholic beverages;
alcoholic beverages, tobacco and narcotics; clothing and footwear; housing, water, electricity, gas
and other fuels; furniture; health; transport; communication; recreation and culture; education;
restaurants and hotels; and miscellaneous goods and services.
4
Estimation of Unit Values
where j = 1, ..., ni and ni is the number of goods within group i, and wij are the weights
provided by national statistical institutes for each item j of group i.2 As an example,
we may suppose that a budget is divided into i = 1, 2 groups such as food and non-food,
and the sub-group food is composed by j = 1, 2, 3 items such as cereals, meat, and other
food.3
So far we have described how to prepare the data matrix containing information
about the available price indexes. Next, we present the background theory used to
pursue the objective to reconstruct the cross-sectional variability of unit values.
2.2
Demographically Varying Pseudo Unit Values: Theory
Lewbel (1989) proposes a method to estimate the cross-sectional variability of actual
unit values by exploiting the demographic information included in generalized “withingroup” equivalence scales or, more generally, demographic functions.4 For a group i of
goods, these are defined as the ratio of a sub-utility function of a reference household
to the the corresponding sub-utility function of a given household estimated without
price variation in place of “between-group” price variation. The method relies on the
assumption that the original utility function is homothetically separable and “withingroup” sub-utility functions are Cobb-Douglas.
Consider a separable utility function U (u1 (q1 , d), ..., un (qn , d)) defined over the consumption of good qi and a set of demographic characteristics d, where U (u1 , ..., un )
is the “between-group” utility function and ui (qi , d) is the “within-group” sub-utility
function, where i = 1, ..., n denotes the aggregate commodity groups. Demographic
characteristics, d, affect U indirectly through the effects on the within-group sub-utility
function. Define the group equivalence scale Mi (q, d) as
Mi (qi , d) =
ui (qi , dh )
,
ui (qi , d)
(2)
where dh describes the demographic profile of a reference household. Define a quantity
index for group i as Qi = ui (qi , dh ) and rewrite the between-group utility function as
U (ui , ..., un ) = U
Q1
Qn
, ...,
M1
Mn
,
(3)
which is formally analogous to Barten’s (1964) technique to introduce demographic
Yh
factors in the utility function. Define further the price index for group i as Pi = Qii ,
where Yih is expenditure on group i by the reference household. To guarantee group
demands are closed under unit scaling, a scaling factor ki must be applied to the the
quantity index Qi that makes Pi = 1 for all i when pij = 1 for all i and j. This would
2. When not available, the sub-group budget shares can be sued as weight of aggregation.
3. If the interest is to build a time-series collection of cross-sections of household budget surveys, then
in the base year the indexes for all goods are equal to 100.
4. This section closely reproduces the procedure developed by Lewbel (1989).
Menon, Perali and Tommasi
5
occur for example in a base year when pij are in index form. Thus Pi = Yih /ki Qi .
Barten’s utility structure implies the following share demands for each household with
total expenditure Y
Wi = Hi (P1 M1 , ..., Pn Mn , Y )
(4)
taking the form of Wih = Hih (P1 , ..., Pn , Y h ) for the reference household with scales
Mi = 1 for all i. The further assumption of homothetic separability admits two-stage
budgeting (Deaton and Muellbauer 1980) and implies the existence of functions Vi such
that Pi = Vi (pi , dh ) is the price index of group i for the reference household with
demographics dh , pi = (pi1 , ..., pini ) is the vector of prices, where ni is the number of
goods that comprise group i. By analogy with the definition of group equivalence scales
in utility space, it follows that
Mi =
Vi (pi , d)
,
Vi (pi , dh )
(5)
where Vi (pi , d) = Mi Pi . Therefore, when demands are homothetically separable, each
group scale depends only on relative prices within group i and on d as expected given
that homothetic separability implies strong separability. Maximization of ui (qi , d) subject to the expenditure pi qi = Yi of group i gives the budget share for an individual good
wij = hij (pi , d, Yi ). For homothetically separable demands, then the budget shares do
not depend on expenditure wij = hij (pi , d) and integrate back in a simple fashion to
Vi = Mi Pi . This information can be used at the between-group level in place of price
data to estimate Wi = Hi (V1 , ..., Vn , Y ). Under the assumption that the sub-group
utility functions are Cobb-Douglas with parameters specified as “shifting” functions of
demographic variables alone, we can specify the following relationship
Fi (qi , d) = ki
then the shares wij =
∂ log Vi
∂ log pij
mij (d)
,
j=1 qij
Q ni
(6)
correspond to the demographic functions
wij = hij (pi , d) = mij (d)
(7)
with
ni
X
j=1
wij (d) =
ni
X
mij (d) = 1.
(8)
j=1
The implied price index is
mij
ni
1 Y
pij
Vi (pi , d) = Mi Pi =
,
ki j=1 mij
(9)
6
Estimation of Unit Values
with
ki =
ni
Y
j=1
mij dh
−mij (dh )
,
(10)
where ki is a scaling function depending only on the choice of the reference demographic
levels.
Note5 that assuming separable and homothetic preferences within groups and letting
qij denote scaled units such that corresponding prices pij are unity in a base year,6 then
the group cost function for the reference household is
b p̃i , dh
ci ui , p̃i , dh = ki ui q̃i , dh
= k i Q i Pi ,
ki
(11)
where b p̃i , dh is concave and linearly homogeneous in prices, and time subscripts are
omitted for simplicity. To ensure that the group price index is unity in the base year,
b(p̃i ,dh )
the scaling factor is ki = b 1, dh . Thus, Pi =
= Vi p̃i , dh and the price per
ki
equivalent capita is
b p̃i , dh b (p̃i , d)
b (p̃i , d)
=
= Vi (p̃i , d) .
M i Pi =
ki
b (p̃i , dh )
ki
(12)
Q ni
m
When the sub-utility functions are Cobb-Douglas, b (p̃i , d) = j=1
(p̃ij /mij ) ij and
h
Q ni
m
1/mhij ij , where the parameters
it is easy to see that the scaling factor is ki = j=1
mij = mij (d) and mhij = mij dh .
It is important to note that the Cobb-Douglas assumption places restrictions only
at the within-group level while leaving the between-group demand equations free to
be arbitrarily flexible. An approximation to Mi Pi = Mi can be obtained by using
the observed within-group budget shares. These results support a simple procedure to
estimate price variation in survey data without quantity information.
2.3
Demographically Varying Pseudo Unit Values: Practice
Given this theoretical setup, we now describe how pseudo unit values can be obtained in
practice. The description corresponds to the implementation of the pseudounit Stata
command.
i
Definition 1 (Pseudo Unit Values - PUV(P̂D
))
5. We would like to thank an anonymous reviewer for having suggested us to report how the expressions
for Mi , Pi and ki are derived.
6. If pijt is the price in year t and pij0 the price in base year 0, then p̃ijt = pijt /pij0 and q̃ijt = qijt pij0 .
Menon, Perali and Tommasi
i
P̂D
7
ni
1 Y
−w
= M i Pi = M i =
w ij ,
ki j=1 ij
(13)
where ki is the average of the sub-group expenditure for the i-th group budget share.
i
The index P̂D
summarizes the cross-section variabilities of prices that can be added
to spatially varying price indexes to resemble unit values expressed in index form as
follows. In general, this technique allows the recovery of the household-specific price
variability that can be found in unit values. The pseudo unit value is an index that can
be compared to actual unit values after normalization choosing the value of a specific
household as a numeraire.
i
))
Definition 2 (Pseudo Unit Values in Index Form - PUV(P̂DI
i
i
i
P̂DI
= P̂DI
Prm
,
(14)
i
where Prm
are the group specific price indexes derived in equation (1).
For pseudo unit values in index form to look like actual unit values have to be
transformed into levels. The transformation in nominal terms is fundamental to properly
capture complementary and substitution effects as shown in Atella, Menon and Perali
(2003). Cross-effects would otherwise be the expression of the differential speed of
i
change of the good-specific price indexes through time only. Note that P̂D
= M i Pi
holds for the base year only where p̃ij = 1 for all regions and time subscripts are
omitted for simplicity. In subsequent time-periods
ni
ni
ni
Y
Y
1 Y
w
w
−w
i
p̃ijij ,
p̃ijij = P̂D
w ij
M i Pi =
ki j=1 ij
j=1
j=1
(15)
i
which is represented by the pseudo unit value in index form P̂DI
= P̂ i P i . Further,
Qni wij D rm
i
i
P̂DI will be an approximation to, unless Prm is equivalent to j=1 p̃ij , which resembles
a Stone price index.
i
Definition 3 (Pseudo Unit Values in Levels - PUV(P̂DIL
))
i
i
P̂DIL
= P̂DI
y¯i ,
(16)
where y¯i is the average expenditure of group i in the base year.
Early experiments with pseudo unit values with Italian household budget data (Perali 1999 and 2000, Atella, Menon and Perali 2003, Menon and Perali 2010) and Hoderlein
and Mihaleva (2008), Berges, Pace Guerrero and Echeverrı̀a (2012) for other data sets
have provided comforting indications about the possibility of estimating regular preferences. Atella, Menon and Perali (2003) describe the effects on the matrix of cross-price
elasticities associated with several price definitions and find that the matrix of compensated elasticities is negative definite only if pseudo unit values are used. Nominal
8
Estimation of Unit Values
pseudo unit values, which more closely reproduce actual unit values, give a set of ownand cross-price effects that is more economically plausible. The derived demand systems are regular and suitable for sound welfare and tax analysis. The authors conclude
that the adoption of pseudo unit values does no harm because Lewbel’s method simply
consists in adding cross-sectional price variability to aggregate price data. Therefore,
Lewbel’s method for constructing demographically varying prices is potentially of great
practical utility.
Because goods may differ in quality from one household to another, and their associated unit values may both reflect these differences in quality, measurement errors
and endogenous expenditure information, the estimated unit values are likely to be
correlated with the equation errors and the resulting estimators will be both biased
and inconsistent. The demand estimation technique should therefore account for price
endogeneity by using instrumental-variable methods.
We now proceed with the description of the pseudounit Stata command.
3
3.1
The pseudounit command
Syntax
The syntax of pseudounit is as follows:
pseudounit expenditures if in , generate(varname) pindex(varname)
impvars(varlist) seed(#) add(#) coll rule(mean|median) expby(varname)
pdi(varname) year(varname) saving(filename , suboptions
where expenditures is the list of expenditure variables of interest. The list must be
specified as follows: the group expenditure first, then all the sub-expenditures of the
group; the pseudounit command verifies if the sum of all sub-expenditures sum to the
group expenditure and that each expenditure has positive or zero value.
3.2
Options
generate(varname) specifies the variable generated with unit values a la Lewbel. generate(varname)
is required.
pindex(varname) specifies the variable containing the price index relative to the group
expenditure. pindex(varname) is required.
impvars(varlist) specifies the variables to be used for the imputation of the zero expenditure shares. The sub-expenditures must be at least 2. The imputation uses the
command mi impute truncreg where the dependent variable is the expenditure
share and the independent variables are the variables specified in impvars(varlist).
If the imputation with mi impute truncreg fails, the command switches to mi
impute pmm using a number of k nearest neighbors equal to 5% of the positive
Menon, Perali and Tommasi
9
observations of the within group shares. It is also possible to use categorical variables whith the appropriate syntax (see [U] Factor variables). Because imputed
shares must be positive, the program checks for negative and greater than 1 imputed values and substitute them with the value of 1. Because the procedure uses
a pi-product, this guarantees that the sub-group expenditure does not contribute
to the group price for that specific household.
seed(#) sets the random-number seed. This option is used to reproduce results. Default is seed(159753).
add(#) specifies the number of imputations to add to the mi data. The total number
of imputations cannot exceed 1,000. Default is 20.
coll rule(mean|median) mi imputation truncreg adds n replica to the data with
n imputations of the missing data, where n corresponds to the value reported in
the add(#) option. Each dataset is identified with values of the variable mi id.
These n imputations are then collapsed in one dataset using the mean or the
median. pseudounit executes the following command: collapse (mean|median)
share var, by( mi id). The default statistics is the mean.
expby(varname) specifies the average group expenditure for the base year.7 Without
the option expby(varname), the variable in option generate(varname) is equal
to the psuedo unit value in index form PUV(DI).
pdi(varname) variable generated with pseudo unit values in index form (PUV(DI)).
year(varname) specifies the name of the year variable when estimating unit values for
several years.
saving(filename [, suboptions )] save graph to disk. This option saves a kernel density
i
graph of Pseudo Unit Values in Levels - PUV(P̂DIL
)). If the option year(varname)
is specified, then no graph is produced. saving(filename [, suboptions]) specifies
the name of the diskfile to be created or replaced. If the filename is specified without an extension, .gph will be assumed. Disabled if year(varname) is specified.
The option year() can be used when a time series of cross-sections is available so
that it is possible to compute the mean expenditure shares wij by each year.
4
The pseudounit command: examples
To familiarize with what the command does, the user maybe interested in using the
following examples using the dataset pseudounit cmd.dta provided with the package.
7. In the case of using one cross-section only, we can choose a given month (for instance January) as
the base year both for the price indexes and the group expenditures.
10
4.1
Estimation of Unit Values
Data
For our example, expenditure data comes from a series of repeated cross-sectional national household budget surveys conducted yearly by the Italian Statistical Institute
(ISTAT). Within each cross-section, households are interviewed at different times during the year, on a monthly basis. The ISTAT budget survey is representative at the
regional level.
The samples of household budgets for the years 2007 and 2008 used in this example
comprise more than 23,000 households per year. In order to reduce the estimation
burden of the present application, we have drawn a random sample of 4,935 households
for the year 2007 and 4,916 for the year 2008. Household expenditures in the provided
dataset have been aggregated into six groups and then transformed in budget shares:
Food, Clothing, Housing, Transport and Communication, Education, and Other goods
and services.
ISTAT collects information about consumer price indexes based on the consumption
habits of the whole population available on a monthly base for each of the 106 Italian
provinces with the COICOP level of disaggregation. We have chosen January 1997 as the
base year. Price indexes have been matched to the two samples taking into account the
period of the year in which the household was interviewed. This means that households
interviewed in March have been matched with prices collected in the same month. After
determining the expenditure groups we constructed the corresponding consumer price
indexes starting from the COICOP categories available for territorial disaggregation
and month that have been matched to all households living in the same region and
interviewed in the same month.
i
Table 1 reports the descriptive statistics of the price index Prm
of the pseudounit
procedure for the six groups of goods and services. Note that if the user already has
price information from external sources organized as in Table 1, she/he can call the
pseudounit procedure without following the bottom up approach outlined above.
i
Table 1: Descriptive Statistics of Prm
by Year
year
2007
2008
Total
idx aggr1
124.776
(4.192)
124.832
(4.199)
124.804
(4.195)
idx aggr2
119.710
(5.426)
119.721
(5.467)
119.715
(5.446)
idx aggr3
126.238
(2.751)
126.183
(2.747)
126.211
(2.749)
idx aggr4
122.144
(2.310)
122.097
(2.301)
122.120
(2.306)
idx aggr5
124.891
(2.677)
124.846
(2.687)
124.869
(2.682)
idx aggr6
118.798
(2.916)
118.733
(2.916)
118.766
(2.916)
Note: Standard errors are in parenthesis.
i
Table 2 reports the levels of the average indexes Prm
by macro region selecting two
households (HH1 and HH2) interviewed in time 1 or 2 in each macro region in order
to illustrate how the levels of price indexes may vary within each region by time of
Menon, Perali and Tommasi
11
interview of the household.
i
Table 2: Average Levels of Prm
by Macro Region and Households HH1 or
HH2 Interviewed in Period 1 or 2
Macro
NW (HH1)
NW (HH2)
NE (HH1)
NE (HH2)
Centre (HH1)
Centre (HH2)
South (HH1)
South (HH2)
Islands (HH1)
Islands (HH2)
idx aggr1
119.9
124.6
124.8
121.8
125.6
122.1
123.9
133.6
122.5
123.5
idx aggr2
115.9
116.4
115.3
116.4
123.2
117.8
109.5
130.9
116.6
109.9
idx aggr3
123.66
122.8
129.4
131.2
121.0
128.0
123.4
121.6
125.1
126.9
idx aggr4
119.4
120.5
123.2
123.5
121.4
123.7
113.9
120.5
123.3
120.3
idx aggr5
124.3
123.3
123.4
124.7
130.8
124.7
114.2
127.2
124.53
120.6
idx aggr6
117.4
116.9
119.2
120.6
116.1
116.4
117.6
115.8
121.0
119.4
The composition of the group expenditures in our dataset is as follows.
Group expenditure 1: Food (ag6sp 1)
• Bread, Cereals and Pasta (ag6sp 1 1)
• Meat, Fish and milk derivates (ag6sp 1 2)
• Fruit and Vegetables (ag6sp 1 3)
• Fats and Oils, Sugar, Alcoholic and Nonalcoholic Drinks and Beverages, Tobacco
(ag6sp 1 4)
Group expenditure 2: Clothing (ag6sp 2)
• Non Assignable Clothing (ag6sp 2 1)
• Clothing and footware: man (ag6sp 2 2)
• Clothing and footware: woman (ag6sp 2 3)
• Clothing and footware: children (ag6sp 2 4)
Group expenditure 3: Housing (ag6sp 3)
• Rents and Condominium Fees (ag6sp 3 1)
• Water, Energy and Heating (ag6sp 3 2)
12
Estimation of Unit Values
• Home Repairs and Large Electrical Appliances (ag6sp 3 3)
• Small Electrical Appliances and Flatware (ag6sp 3 4)
Group expenditure 4: Transport and Communications (ag6sp 4)
• Private Transportation (fuels, repairs) (ag6sp 4 1)
• Public Transportation (ag6sp 4 2)
• Telephone (ag6sp 4 3)
• Purchase of Means of Transportation and Telephone (ag6sp 4 4)
Group expenditure 5: Leisure and Education (ag6sp 5)
• Education Expenditures (ag6sp 5 1)
• Leisure (ag6sp 5 2)
• Computer, Music, Televisions (ag6sp 5 3)
• Other (ag6sp 5 4)
Group expenditure 6: Health and Other no Food (ag6sp 6)
• Medical Examinations, Medicines (ag6sp 6 1)
• Insurance, Expenditures for Medical Assistance, Other (ag6sp 6 2)
The dataset comprises the price indexes associated with each expenditure (idx aggr1
- idx aggr6) and the mean expenditures evaluated at the base year (1997) for each of
the 6 selected expenditure categories conditioned by region, number of household members and month (mu ag6sp 1 - mu ag6sp 6). Other variables are residential location
(urban or rural) location, number of household components (nc), macroarea (ripgeo),
age of the household head (etacf), education of the household head titstucf, and the
logarithm of the household total annual expenditure (lnx).
Note that in the base year, average expenditures are computed by region and month
in order to preserve the maximum territorial and time variation.
4.2
Examples
We now implement the pseudounit command to estimate unit values for food, clothing,
housing and transport and communication in order to illustrate how to use the options
available in the command.8
8. Our results are obtained using STATA 14 and possible marginal differences may be due to previous
versions of STATA adopting a different pseudorandom number generator.
Menon, Perali and Tommasi
13
Estimation of unit values for the food expenditure group: food (ag6sp 1) (sub-group
expenditures are: bread, cereals and pasta (ag6sp 1 1), meat, fish, milk and oteir protein
(ag6sp 1 2), fruits and vegetables (ag6sp 1 3), fats and oils, sugar, beverage and tobacco
(ag6sp 1 4)).
The variables used for the imputation of zero expenditures are residential location
location, macroarea ripgeo, number of household components nc, age of the household
head etacf, education of the househod head titstucf, and the logarithm of the total
annual household expenditure lnx. The multiple imputation of the zero expenditure
shares generates 30 datasets that are then summarized using the mean as default. The
regional price index is idx aggr1 reg and the mean expenditure computed at the base
year for food is mu ag6sp 1.
The variable associated with the unit values of the food expenditure is lwbp aggr1:
. use pseudounit_cmd.dta, clear
.
. pseudounit ag6sp_1 ag6sp_1_1 ag6sp_1_2 ag6sp_1_3 ag6sp_1_4 if year==2007, ///
> impvars(location i.ripgeo nc etacf i.titstucf lnx) ///
> pindex(idx_aggr1) expby(mu_ag6sp_1) gen(lwbp_aggr1) ///
> add(30) seed(889922) coll_rule(median)
DESCRIPTIVES STATISTICS
Variable
|
Obs.
Mean
Median
Std. Dev.
Min
Max
--------------------------------------------------------------------------------------------PUV(D)
|
4971
.9405628
.9557249
.0908108
.3564937
1.09878
PUV(DI)
|
4971
1.173586
1.189731
.1196185
.4406378
1.478698
lwbp_aggr1
|
4971
516.1335
517.3575
151.9128
127.4489
1015.579
--------------------------------------------------------------------------------------------Note: lwbp_aggr1 is Pseudo Unit Values in Levels PUV(DIL)
In this case, there are no imputations because there are no zero share expenditures.
Estimation of unit values for the clothing expenditure group: Clothing (sub-group
expenditures are: Non Assignable Clothing, Clothing and Footware for Men, Clothing
and Footware for Women; Clothing and Footware for Children).
. pseudounit ag6sp_2 ag6sp_2_? if year==2007, ///
> impvars(location i.ripgeo nc etacf i.titstucf lnx) pindex(idx_aggr2) ///
> expby(mu_ag6sp_2) gen(lwbp_aggr2) ///
> add(30) seed (889922) coll_rule(median)
**** EXPENDITURE ag6sp_2_2 ****
644 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Method: truncreg regression
Limit: lower = 0
upper = 1
Total Observations: 4971
Complete observations: 4327
Missing observations: 644
Imputed observations: 644
0 values for expenditure ag6sp_2_2 converted to 1
14
Estimation of Unit Values
**** EXPENDITURE ag6sp_2_3 ****
355 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Method: truncreg regression
Limit: lower = 0
upper = 1
Total Observations: 4971
Complete observations: 4616
Missing observations: 355
Imputed observations: 355
0 values for expenditure ag6sp_2_3 converted to 1
**** EXPENDITURE ag6sp_2_4 ****
3149 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Method: truncreg regression
Limit: lower = 0
upper = 1
Total Observations: 4971
Complete observations: 1822
Missing observations: 3149
Imputed observations: 3149
0 values for expenditure ag6sp_2_4 converted to 1
DESCRIPTIVES STATISTICS
Variable
|
Obs.
Mean
Median
Std. Dev.
Min
Max
--------------------------------------------------------------------------------------------PUV(D)
|
4971
1.049126
1.051407
.0675399
.4734136
1.181265
PUV(DI)
|
4971
1.25613
1.259243
.1018521
.6206452
1.556212
lwbp_aggr2
|
4971
174.4161
177.3384
67.20049
24.84464
555.8301
--------------------------------------------------------------------------------------------Note: lwbp_aggr2 is Pseudo Unit Values in Levels PUV(DIL)
In this case, there are no imputations for the sub-group expenditure ag6sp 2 1, but there
are imputations for the sub-group expenditures ag6sp 2 2, ag6sp 2 3 and ag6sp 2 4.
Estimation of unit values for the housing expenditure group: Housing (sub-group
expenditures are: Rents and Condo Expenses; Water, Energy and Heating; Home repairs and Large Electrical Appliances; Small Electrical Appliances and Flatware). The
variable lwbp aggr3 is created for each year year.
. pseudounit ag6sp_3 ag6sp_3_?, ///
> impvars(location i.ripgeo nc etacf i.titstucf lnx) pindex(idx_aggr3) ///
> expby(mu_ag6sp_3) gen(lwbp_aggr3) year(year)
**** EXPENDITURE ag6sp_3_3 ****
1668 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Method: truncreg regression
Limit: lower = 0
upper = 1
Total Observations: 9859
Menon, Perali and Tommasi
15
Complete observations: 8191
Missing observations: 1668
Imputed observations: 1668
0 values for expenditure ag6sp_3_3 converted to 1
DESCRIPTIVES STATISTICS
Variable
|
Obs.
Mean
Median
Std. Dev.
Min
Max
--------------------------------------------------------------------------------------------2007
|
PUV(D)
|
4971
.9480497
.931581
.1956049
.4917437
1.509897
PUV(DI)
|
4971
1.196825
1.177358
.2483218
.6092809
1.940222
lwbp_aggr3
|
4971
729.4016
698.4557
248.5652
160.168
1894.283
--------------------------------------------------------------------------------------------2008
|
PUV(D)
|
4888
.9482172
.9302261
.1977599
.4697024
1.527056
PUV(DI)
|
4888
1.196471
1.173393
.2506707
.5757318
1.938348
lwbp_aggr3
|
4888
723.3252
692.9372
246.5752
187.4321
1780.587
--------------------------------------------------------------------------------------------Note: lwbp_aggr3 is Pseudo Unit Values in Levels PUV(DIL)
Estimation of unit values for the transport and communications expenditure group:
Transport and Communications (sub-group expenditures are: Private and Public Transportation, Telephone and Purchase of Transportation Means) and associated graph.
. pseudounit ag6sp_4 ag6sp_4_? if year==2008, ///
> impvars(location i.ripgeo nc etacf i.titstucf lnx) pindex(idx_aggr4) ///
> expby(mu_ag6sp_4) gen(lwbp_aggr4) seed(889922) coll_rule(median) ///
> sav(kd_sp4, replace)
**** EXPENDITURE ag6sp_4_1 ****
907 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Method: truncreg regression
Limit: lower = 0
upper = 1
Total Observations: 4888
Complete observations: 3981
Missing observations: 907
Imputed observations: 907
0 values for expenditure ag6sp_4_1 converted to 1
**** EXPENDITURE ag6sp_4_2 ****
379 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Method: truncreg regression
Limit: lower = 0
upper = 1
Total Observations: 4888
Complete observations: 4509
Missing observations: 379
Imputed observations: 379
0 values for expenditure ag6sp_4_2 converted to 1
16
Estimation of Unit Values
**** EXPENDITURE ag6sp_4_3 ****
60 observations to impute
MULTIPLE IMPUTATION OVERVIEW
Pay attention: imputation method switched to pmm
Method: pmm regression
Total Observations: 4888
Complete observations: 4828
Missing observations: 60
Imputed observations: 60
Number of k nearest neighbors: 241
0 values for expenditure ag6sp_4_3 converted to 1
DESCRIPTIVES STATISTICS
Variable
|
Obs.
Mean
Median
Std. Dev.
Min
Max
--------------------------------------------------------------------------------------------PUV(D)
|
4888
.8136841
.8143076
.1388408
.4111537
1.281451
PUV(DI)
|
4888
.9933574
.9960147
.169638
.5019528
1.561017
lwbp_aggr4
|
4888
360.8287
319.7612
223.4753
10.74425
2049.023
--------------------------------------------------------------------------------------------Note: lwbp_aggr4 is Pseudo Unit Values in Levels PUV(DIL)
(file kd_sp4.gph saved)
�
�����
��������������
����
�����
����
Note that the multiple imputation procedure using the truncreg method for the subgroup expenditure ag6sp 4 3 failed and the program switched to the predictive mean
matching method (pmm).
�
���
����
����������
����
Figure 1: lwbp aggr4 Kernel Density Estimation
����
Menon, Perali and Tommasi
5
17
Conclusions
The main objective of the pseudounit Stata module presented here is to make household budget surveys that collect only information about expenditures suitable also for
demand and welfare analysis. Thanks to the pseudounit command, the lack of information about quantities no longer precludes the possibility of deriving household specific
prices (unit values) and of estimating complete demand systems suitable for welfare
analysis.
6
Acknowledgments
The authors wish to thank an anonymous referee and Lucia Echeverrı̀a for helpful
comments and suggestions. Any errors and omissions are the sole responsibility of the
authors.
7
References
Atella, V., M. Menon, and F. Perali (2003): “Estimation of Unit Values in Cross Sections
without Quantity Information,” In Household Behaviour, Equivalence Scales, Welfare
and Poverty, edited by C. Dagum and G. Ferrari. New York: Physica-Verlag.
Barten, A. P. (1964): “Family Composition, Prices and Expenditure Patterns,” in
Econometric Analysisfor National Economic Planning: 16th Symposium of the Colston
Society, eds. P. Hart, G. Mills, and J. K. Whitaker, London: Butterworth.
Berges, M., Pace Guerrero, I., and L. Echeverrı̀a (2012): “La Utilizaciòn de Precios
Implı̀citos o de Pseudo Precios Implı̀citos en la Estimaciòn de un Sistema de Demandas QUAIDS para Alimentos,” Nülan. Deposited Documents 1675, Centro de Documentaciòn, Facultad de Ciencias Econòmicas y Sociales, Universidad Nacional de Mar
del Plata.
Coondoo, D., A. Majumder, and R. Ray (2004): “On a Method of Calculating Regional
Price Differentials with Illustrative Evidence from India,” Review of Income and Wealth,
50(1): 51–68.
Dagsvik, J. K., and L. Brubakk (1998): “Price Indexes for Elementary Aggregates
Derived from Behavioral Assumptions,” Discussion Papers No. 234, Statistics Norway,
Research Department.
Deaton, A. (1987): “Estimation of Own- and Cross-price Elasticities from Household
Survey Data,” Journal of Econometrics, 36(1-2): 7–30.
Deaton, A. (1988a): “Household Survey Data and Pricing Policies in Developing Countries,” World Bank Economic Review, 3(2): 183–210.
18
Estimation of Unit Values
Deaton, A. (1988b): “Quality, Quantity, and Spatial Variation of Prices,” American
Economic Review, 78(3): 418–30.
Deaton, A. (1990): “Prices Elasticities from Survey Data - Extensions and Indonesian
Results,” Journal of Econometrics, 44(3): 218–309.
Deaton, A. (1998): “Getting Prices Right: What Should Be Done?,” Journal of Economic Perspectives, 12(1): 37–46.
Deaton, A., and J. Muellbauer (1980): “Economics and Consumer Behavior,” Cambridge University Press.
Frisch, R. (1959): “A Complete Scheme for Computing All Direct and Cross Demand
Elasticities in a Model with Many Sectors,” Econometrica, 27(2): 177–196.
Hoderlein, S. and S. Mihaleva (2008): “Increasing the Price Variation in a Repeated
Cross Section,” Journal of Econometrics 147(2): 316–325.
Lahatte, A., R. Miquel, F. Laisney, and I. Preston (1998): “Demand Systems with Unit
Values: A Comparison of Two Specifications,” Economics Letters, 58(8): 281–290.
Lecocq, S. and J.M. Robin (2015): “Estimating Almost-Ideal Demand Systems with
Endogenous Regressors,” Stata Journal, 15(2): 554–573.
Lewbel, A. (1989): “Identification and Estimation of Equivalence Scales under Weak
Separability,” Review of Economic Studies, 56(2): 311–316.
Menon, M. and F. Perali (2010): “Econometric Identification of the Cost of Maintaining
a Child,” Research on Economic Inequality, 18: 219–256.
Perali, F. (1999): “Stima delle Scale di Equivalenza utilizzando i Bilanci Familiari
ISTAT 1985-1994,” Rivista Internazionale di Studi Sociali, 67–129.
Perali, F. (2000): “Analisi di Tassazione Ottimale Applicata al Consumo di Bevande,”
in F. Perali (ed.), Microeconomia Applicata. Volume I, Carocci Editore, Roma.
Perali, F. (2003): The Behavioral and Welfare Analysis of Consumption, Kluwer Academic Publishers, Amsterdam.
Slesnick, D. (1998): “Empirical Approaches to the Measurement of Welfare,” Journal
of Economic Literature, 36(4): 2108–2165.
About the authors
Martina Menon is an assistant professor of economics at the University of Verona in Italy. Her
main research interests are economics of the family, consumption analysis, welfare analysis,
and applied econometrics.
Menon, Perali and Tommasi
19
Federico Perali is a full professor of economics at the University of Verona in Italy. His main
research interests are economics of the family, consumption analysis, welfare analysis, and
applied econometrics.
Nicola Tommasi is a research assistant at the University of Verona in Italy. His main research
interests are applied econometrics and statistical programming.