Academia.eduAcademia.edu

(online) The Stata Journal (yyyy) vv, Number ii

2016

This paper presents the Stata command pseudounit that estimates Pseudo Unit Values in cross-sections of household expenditure surveys without quantity information. Household surveys traditionally record expenditure information only. The lack of information about quantities purchased precludes the possibility of deriving household specific unit values. We use a theoretical result developed by Lewbel (1989) to construct "pseudo" unit values by first reproducing cross-sectional price variation, and then adding this variability to the aggregate price indexes published by national statistical institutes. We illustrate the method with an example that uses a time-series of cross-sections of Italian household budgets .

Working Paper Series Department of Economics University of Verona Estimation of Unit Values in Household Expenditure Surveys without Quantity Information Martina Menon, Federico Perali, Nicola Tommasi WP Number: 19 ISSN: 2036-2919 (paper), October 2016 2036-4679 (online) The Stata Journal (yyyy) vv, Number ii, pp. 1–19 Estimation of Unit Values in Household Expenditure Surveys without Quantity Information Martina Menon University of Verona Department of Economics Verona, Italy [email protected] Federico Perali University of Verona Department of Economics and CHILD Verona, Italy [email protected] Nicola Tommasi University of Verona Interdepartmental Centre of Economic Documentation (CIDE) Verona, Italy [email protected] Abstract. This paper presents the Stata command pseudounit that estimates Pseudo Unit Values in cross-sections of household expenditure surveys without quantity information. Household surveys traditionally record expenditure information only. The lack of information about quantities purchased precludes the possibility of deriving household specific unit values. We use a theoretical result developed by Lewbel (1989) to construct “pseudo” unit values by first reproducing cross-sectional price variation, and then adding this variability to the aggregate price indexes published by national statistical institutes. We illustrate the method with an example that uses a time-series of cross-sections of Italian household budgets. Keywords: st0001, Unit values, Cross-section prices, Demand analysis, pseudounit. 1 Introduction This paper presents the theory used to implement a Stata command that estimates unit values in cross-sections of household expenditure surveys without quantity information and describes how the command should be used. Empirical works on demand analysis generally rely on the assumption of price invariance across households, supported by the hypothesis that in cross-sectional data there are neither time nor spatial variations in prices. According to this assumption each family pays the same prices for homogeneous goods. Micro-data with this characteristic allow researchers to estimate only Engel curves without accounting for price effects, which are crucial for both behavioural and welfare applications. Slesnick (1998:150) remarks that “the absence of price information in the surveys creates special problems for the measurement of social welfare, inequality and poverty. ... Most empirical work links micro data with national price series on different types of goods so cross sectional variation is ignored. Access to more disaggregate information on prices will enhance our ability to measure social welfare, c yyyy StataCorp LP st0001 2 Estimation of Unit Values although it remains to be seen whether fundamental conclusions concerning distributional issues will be affected.” In empirical works such limitation is usually by-passed by analysing time-series of cross-sections where price information comes from aggregate time series data. Plausible estimates of price effects require a series of cross-sections that is long enough and, if possible, aggregate price indexes that vary by month and location, usually by region or province. Household budget surveys of both developed and developing countries can be classified into two broad categories in increasing order of frequency of occurrence: 1) surveys of expenditure and quantities purchased, and 2) surveys of expenditure data only. In the first case, where quantities and expenditure are both observed, cross-sectional prices are obtained as implicit prices, dividing expenditure by quantities, and are more properly referred to as “unit values.” When dealing with these surveys it is important to remember that a proper use of unit values in econometric analyses must take into account problems arising from the fact that unit values provide useful information about prices, but differ from market prices in many respects. The ratio between expenditure and quantities bought embed information about the choice of quality (Deaton 1987, 1988a, 1988b, 1990, 1998, Perali 2003). The level of the unit value of a composite good depends on the relative share of high-quality items and the composition of the aggregate good. Unit values can be highly variable also for supposedly homogeneous goods because the market offers many different grades and types. On the other hand, when dealing with surveys that report only expenditure information, aggregate national price indexes are usually merged with household expenditure to obtain estimates of price elasticities. Unfortunately, this approach requires a long time-series of cross-sectional data to estimate a demand system with sufficient price variation and relies on very restrictive assumptions (Frisch 1959), which often turn out to be rejected in empirical applications. Aggregate price indexes are in general highly correlated, may suffer from endogeneity problems (Lecocq and Robin 2015) and the estimated elasticities are often not coherent with the theory (Atella, Menon and Perali 2003, Coondoo, Majumder and Ray 2004, Dagsvik and Brubakk 1998, Lahatte, Laisnay and Preston 1998). For these reasons, surveys gathering exclusively expenditure data, such as the Italian household budget survey conducted by the National Statistical Institute (ISTAT) used in our example, and the majority of existing household budget surveys, have limited applicability in modern demand and welfare analysis. It is then important to devise an appropriate procedure to compute “pseudo” unit values using the information traditionally available in expenditure surveys, such as budget shares and demographic characteristics, which help reproducing the distribution of the unit value variability as closely as possible. The theoretical background for this undertaking is provided in a study by Lewbel (1989). The remainder of the paper is organized as follows. Section 2 presents the theory and the method used to derive consumer price indexes and pseudo unit values when the main objective is to implement demand analysis of household budget data without quantity information. Section 3 provides the syntax and options of the pseudounit Stata command. Section 4 illustrates the application. Menon, Perali and Tommasi 2 3 The Estimation of Unit Values in Cross-Section Analysis of Household Budget Surveys We introduce a method that recovers unit values when only expenditure information is available using knowledge about aggregate price indexes available from national statistics. First, as illustrated in Section 2.1, we need to collect the consumer price indexes available from official statistics and to associate them with each household in the survey. Then, in order to improve the precision of the estimated price elasticities as shown in Atella, Menon and Perali (2003), we reproduce as best as we can the price variation of actual unit values, which could be obtained as the ratio between expenditure and quantities if quantity information were available in the survey. The estimation of pseudo unit values is described in Section 2.2 and 2.3. 2.1 Consumer Price Indexes Eurostat adopts the Classification Of Individual COnsumption by Purpose, abbreviated as COICOP, that is a nomenclature developed by the United Nations Statistics Division to classify and analyze individual consumption expenditures incurred by households, non-profit institutions serving households, and general government according to their purpose.1 National statistical institutes traditionally publish consumer price indexes per each COICOP category on a monthly basis, which are collected at provincial level. ij be the consumer price index for the j-th of the i-th COICOP group with Let Prm i = 1, ..., n collected by national statistical institutes on a monthly basis m = 1, ..., M per each territorial level r = 1, ..., R, such as a province or a region. These price indexes are the same for all households living in the same region and interviewed in the same month. If detailed price information disaggregated by territorial level or time is not available, then we only have P i that is the same for all households. With this highly limited price information, demand analysis cannot be implemented because the data matrix is not invertible and pseudo unit values must be estimated. The next task is to match the monthly price index specific to each territorial unit ij Prm with all households living in province or region r and interviewed at month m. Then i ij for i = 1, ..., K groups corresponding to is aggregated into a price index Prm each Prm the goods selected for the empirical demand analysis. The aggregation uses Laspeyres indexes i Prm = ni X  j=1  ij Prm wij , (1) 1. The COICOP top level aggregation encompasses 12 categories: food and non-alcoholic beverages; alcoholic beverages, tobacco and narcotics; clothing and footwear; housing, water, electricity, gas and other fuels; furniture; health; transport; communication; recreation and culture; education; restaurants and hotels; and miscellaneous goods and services. 4 Estimation of Unit Values where j = 1, ..., ni and ni is the number of goods within group i, and wij are the weights provided by national statistical institutes for each item j of group i.2 As an example, we may suppose that a budget is divided into i = 1, 2 groups such as food and non-food, and the sub-group food is composed by j = 1, 2, 3 items such as cereals, meat, and other food.3 So far we have described how to prepare the data matrix containing information about the available price indexes. Next, we present the background theory used to pursue the objective to reconstruct the cross-sectional variability of unit values. 2.2 Demographically Varying Pseudo Unit Values: Theory Lewbel (1989) proposes a method to estimate the cross-sectional variability of actual unit values by exploiting the demographic information included in generalized “withingroup” equivalence scales or, more generally, demographic functions.4 For a group i of goods, these are defined as the ratio of a sub-utility function of a reference household to the the corresponding sub-utility function of a given household estimated without price variation in place of “between-group” price variation. The method relies on the assumption that the original utility function is homothetically separable and “withingroup” sub-utility functions are Cobb-Douglas. Consider a separable utility function U (u1 (q1 , d), ..., un (qn , d)) defined over the consumption of good qi and a set of demographic characteristics d, where U (u1 , ..., un ) is the “between-group” utility function and ui (qi , d) is the “within-group” sub-utility function, where i = 1, ..., n denotes the aggregate commodity groups. Demographic characteristics, d, affect U indirectly through the effects on the within-group sub-utility function. Define the group equivalence scale Mi (q, d) as Mi (qi , d) = ui (qi , dh ) , ui (qi , d) (2) where dh describes the demographic profile of a reference household. Define a quantity index for group i as Qi = ui (qi , dh ) and rewrite the between-group utility function as U (ui , ..., un ) = U  Q1 Qn , ..., M1 Mn  , (3) which is formally analogous to Barten’s (1964) technique to introduce demographic Yh factors in the utility function. Define further the price index for group i as Pi = Qii , where Yih is expenditure on group i by the reference household. To guarantee group demands are closed under unit scaling, a scaling factor ki must be applied to the the quantity index Qi that makes Pi = 1 for all i when pij = 1 for all i and j. This would 2. When not available, the sub-group budget shares can be sued as weight of aggregation. 3. If the interest is to build a time-series collection of cross-sections of household budget surveys, then in the base year the indexes for all goods are equal to 100. 4. This section closely reproduces the procedure developed by Lewbel (1989). Menon, Perali and Tommasi 5 occur for example in a base year when pij are in index form. Thus Pi = Yih /ki Qi . Barten’s utility structure implies the following share demands for each household with total expenditure Y Wi = Hi (P1 M1 , ..., Pn Mn , Y ) (4) taking the form of Wih = Hih (P1 , ..., Pn , Y h ) for the reference household with scales Mi = 1 for all i. The further assumption of homothetic separability admits two-stage budgeting (Deaton and Muellbauer 1980) and implies the existence of functions Vi such that Pi = Vi (pi , dh ) is the price index of group i for the reference household with demographics dh , pi = (pi1 , ..., pini ) is the vector of prices, where ni is the number of goods that comprise group i. By analogy with the definition of group equivalence scales in utility space, it follows that Mi = Vi (pi , d) , Vi (pi , dh ) (5) where Vi (pi , d) = Mi Pi . Therefore, when demands are homothetically separable, each group scale depends only on relative prices within group i and on d as expected given that homothetic separability implies strong separability. Maximization of ui (qi , d) subject to the expenditure pi qi = Yi of group i gives the budget share for an individual good wij = hij (pi , d, Yi ). For homothetically separable demands, then the budget shares do not depend on expenditure wij = hij (pi , d) and integrate back in a simple fashion to Vi = Mi Pi . This information can be used at the between-group level in place of price data to estimate Wi = Hi (V1 , ..., Vn , Y ). Under the assumption that the sub-group utility functions are Cobb-Douglas with parameters specified as “shifting” functions of demographic variables alone, we can specify the following relationship Fi (qi , d) = ki then the shares wij = ∂ log Vi ∂ log pij mij (d) , j=1 qij Q ni (6) correspond to the demographic functions wij = hij (pi , d) = mij (d) (7) with ni X j=1 wij (d) = ni X mij (d) = 1. (8) j=1 The implied price index is mij ni  1 Y pij Vi (pi , d) = Mi Pi = , ki j=1 mij (9) 6 Estimation of Unit Values with ki = ni Y j=1 mij dh −mij (dh ) , (10) where ki is a scaling function depending only on the choice of the reference demographic levels. Note5 that assuming separable and homothetic preferences within groups and letting qij denote scaled units such that corresponding prices pij are unity in a base year,6 then the group cost function for the reference household is    b p̃i , dh ci ui , p̃i , dh = ki ui q̃i , dh = k i Q i Pi , ki (11)  where b p̃i , dh is concave and linearly homogeneous in prices, and time subscripts are omitted for simplicity. To ensure that the group price index is unity in the base year,   b(p̃i ,dh ) the scaling factor is ki = b 1, dh . Thus, Pi = = Vi p̃i , dh and the price per ki equivalent capita is  b p̃i , dh b (p̃i , d) b (p̃i , d) = = Vi (p̃i , d) . M i Pi = ki b (p̃i , dh ) ki (12) Q ni m When the sub-utility functions are Cobb-Douglas, b (p̃i , d) = j=1 (p̃ij /mij ) ij and h  Q ni m 1/mhij ij , where the parameters it is easy to see that the scaling factor is ki = j=1  mij = mij (d) and mhij = mij dh . It is important to note that the Cobb-Douglas assumption places restrictions only at the within-group level while leaving the between-group demand equations free to be arbitrarily flexible. An approximation to Mi Pi = Mi can be obtained by using the observed within-group budget shares. These results support a simple procedure to estimate price variation in survey data without quantity information. 2.3 Demographically Varying Pseudo Unit Values: Practice Given this theoretical setup, we now describe how pseudo unit values can be obtained in practice. The description corresponds to the implementation of the pseudounit Stata command. i Definition 1 (Pseudo Unit Values - PUV(P̂D )) 5. We would like to thank an anonymous reviewer for having suggested us to report how the expressions for Mi , Pi and ki are derived. 6. If pijt is the price in year t and pij0 the price in base year 0, then p̃ijt = pijt /pij0 and q̃ijt = qijt pij0 . Menon, Perali and Tommasi i P̂D 7 ni 1 Y −w = M i Pi = M i = w ij , ki j=1 ij (13) where ki is the average of the sub-group expenditure for the i-th group budget share. i The index P̂D summarizes the cross-section variabilities of prices that can be added to spatially varying price indexes to resemble unit values expressed in index form as follows. In general, this technique allows the recovery of the household-specific price variability that can be found in unit values. The pseudo unit value is an index that can be compared to actual unit values after normalization choosing the value of a specific household as a numeraire. i )) Definition 2 (Pseudo Unit Values in Index Form - PUV(P̂DI i i i P̂DI = P̂DI Prm , (14) i where Prm are the group specific price indexes derived in equation (1). For pseudo unit values in index form to look like actual unit values have to be transformed into levels. The transformation in nominal terms is fundamental to properly capture complementary and substitution effects as shown in Atella, Menon and Perali (2003). Cross-effects would otherwise be the expression of the differential speed of i change of the good-specific price indexes through time only. Note that P̂D = M i Pi holds for the base year only where p̃ij = 1 for all regions and time subscripts are omitted for simplicity. In subsequent time-periods      ni ni ni Y Y 1 Y w w −w i  p̃ijij  , p̃ijij  = P̂D w ij   M i Pi = ki j=1 ij j=1 j=1 (15) i which is represented by the pseudo unit value in index form P̂DI = P̂ i P i . Further, Qni wij D rm i i P̂DI will be an approximation to, unless Prm is equivalent to j=1 p̃ij , which resembles a Stone price index. i Definition 3 (Pseudo Unit Values in Levels - PUV(P̂DIL )) i i P̂DIL = P̂DI y¯i , (16) where y¯i is the average expenditure of group i in the base year. Early experiments with pseudo unit values with Italian household budget data (Perali 1999 and 2000, Atella, Menon and Perali 2003, Menon and Perali 2010) and Hoderlein and Mihaleva (2008), Berges, Pace Guerrero and Echeverrı̀a (2012) for other data sets have provided comforting indications about the possibility of estimating regular preferences. Atella, Menon and Perali (2003) describe the effects on the matrix of cross-price elasticities associated with several price definitions and find that the matrix of compensated elasticities is negative definite only if pseudo unit values are used. Nominal 8 Estimation of Unit Values pseudo unit values, which more closely reproduce actual unit values, give a set of ownand cross-price effects that is more economically plausible. The derived demand systems are regular and suitable for sound welfare and tax analysis. The authors conclude that the adoption of pseudo unit values does no harm because Lewbel’s method simply consists in adding cross-sectional price variability to aggregate price data. Therefore, Lewbel’s method for constructing demographically varying prices is potentially of great practical utility. Because goods may differ in quality from one household to another, and their associated unit values may both reflect these differences in quality, measurement errors and endogenous expenditure information, the estimated unit values are likely to be correlated with the equation errors and the resulting estimators will be both biased and inconsistent. The demand estimation technique should therefore account for price endogeneity by using instrumental-variable methods. We now proceed with the description of the pseudounit Stata command. 3 3.1 The pseudounit command Syntax The syntax of pseudounit is as follows:    pseudounit expenditures if in , generate(varname) pindex(varname)  impvars(varlist) seed(#) add(#) coll rule(mean|median) expby(varname)   pdi(varname) year(varname) saving(filename , suboptions where expenditures is the list of expenditure variables of interest. The list must be specified as follows: the group expenditure first, then all the sub-expenditures of the group; the pseudounit command verifies if the sum of all sub-expenditures sum to the group expenditure and that each expenditure has positive or zero value. 3.2 Options generate(varname) specifies the variable generated with unit values a la Lewbel. generate(varname) is required. pindex(varname) specifies the variable containing the price index relative to the group expenditure. pindex(varname) is required. impvars(varlist) specifies the variables to be used for the imputation of the zero expenditure shares. The sub-expenditures must be at least 2. The imputation uses the command mi impute truncreg where the dependent variable is the expenditure share and the independent variables are the variables specified in impvars(varlist). If the imputation with mi impute truncreg fails, the command switches to mi impute pmm using a number of k nearest neighbors equal to 5% of the positive Menon, Perali and Tommasi 9 observations of the within group shares. It is also possible to use categorical variables whith the appropriate syntax (see [U] Factor variables). Because imputed shares must be positive, the program checks for negative and greater than 1 imputed values and substitute them with the value of 1. Because the procedure uses a pi-product, this guarantees that the sub-group expenditure does not contribute to the group price for that specific household. seed(#) sets the random-number seed. This option is used to reproduce results. Default is seed(159753). add(#) specifies the number of imputations to add to the mi data. The total number of imputations cannot exceed 1,000. Default is 20. coll rule(mean|median) mi imputation truncreg adds n replica to the data with n imputations of the missing data, where n corresponds to the value reported in the add(#) option. Each dataset is identified with values of the variable mi id. These n imputations are then collapsed in one dataset using the mean or the median. pseudounit executes the following command: collapse (mean|median) share var, by( mi id). The default statistics is the mean. expby(varname) specifies the average group expenditure for the base year.7 Without the option expby(varname), the variable in option generate(varname) is equal to the psuedo unit value in index form PUV(DI). pdi(varname) variable generated with pseudo unit values in index form (PUV(DI)). year(varname) specifies the name of the year variable when estimating unit values for several years. saving(filename [, suboptions )] save graph to disk. This option saves a kernel density i graph of Pseudo Unit Values in Levels - PUV(P̂DIL )). If the option year(varname) is specified, then no graph is produced. saving(filename [, suboptions]) specifies the name of the diskfile to be created or replaced. If the filename is specified without an extension, .gph will be assumed. Disabled if year(varname) is specified. The option year() can be used when a time series of cross-sections is available so that it is possible to compute the mean expenditure shares wij by each year. 4 The pseudounit command: examples To familiarize with what the command does, the user maybe interested in using the following examples using the dataset pseudounit cmd.dta provided with the package. 7. In the case of using one cross-section only, we can choose a given month (for instance January) as the base year both for the price indexes and the group expenditures. 10 4.1 Estimation of Unit Values Data For our example, expenditure data comes from a series of repeated cross-sectional national household budget surveys conducted yearly by the Italian Statistical Institute (ISTAT). Within each cross-section, households are interviewed at different times during the year, on a monthly basis. The ISTAT budget survey is representative at the regional level. The samples of household budgets for the years 2007 and 2008 used in this example comprise more than 23,000 households per year. In order to reduce the estimation burden of the present application, we have drawn a random sample of 4,935 households for the year 2007 and 4,916 for the year 2008. Household expenditures in the provided dataset have been aggregated into six groups and then transformed in budget shares: Food, Clothing, Housing, Transport and Communication, Education, and Other goods and services. ISTAT collects information about consumer price indexes based on the consumption habits of the whole population available on a monthly base for each of the 106 Italian provinces with the COICOP level of disaggregation. We have chosen January 1997 as the base year. Price indexes have been matched to the two samples taking into account the period of the year in which the household was interviewed. This means that households interviewed in March have been matched with prices collected in the same month. After determining the expenditure groups we constructed the corresponding consumer price indexes starting from the COICOP categories available for territorial disaggregation and month that have been matched to all households living in the same region and interviewed in the same month. i Table 1 reports the descriptive statistics of the price index Prm of the pseudounit procedure for the six groups of goods and services. Note that if the user already has price information from external sources organized as in Table 1, she/he can call the pseudounit procedure without following the bottom up approach outlined above. i Table 1: Descriptive Statistics of Prm by Year year 2007 2008 Total idx aggr1 124.776 (4.192) 124.832 (4.199) 124.804 (4.195) idx aggr2 119.710 (5.426) 119.721 (5.467) 119.715 (5.446) idx aggr3 126.238 (2.751) 126.183 (2.747) 126.211 (2.749) idx aggr4 122.144 (2.310) 122.097 (2.301) 122.120 (2.306) idx aggr5 124.891 (2.677) 124.846 (2.687) 124.869 (2.682) idx aggr6 118.798 (2.916) 118.733 (2.916) 118.766 (2.916) Note: Standard errors are in parenthesis. i Table 2 reports the levels of the average indexes Prm by macro region selecting two households (HH1 and HH2) interviewed in time 1 or 2 in each macro region in order to illustrate how the levels of price indexes may vary within each region by time of Menon, Perali and Tommasi 11 interview of the household. i Table 2: Average Levels of Prm by Macro Region and Households HH1 or HH2 Interviewed in Period 1 or 2 Macro NW (HH1) NW (HH2) NE (HH1) NE (HH2) Centre (HH1) Centre (HH2) South (HH1) South (HH2) Islands (HH1) Islands (HH2) idx aggr1 119.9 124.6 124.8 121.8 125.6 122.1 123.9 133.6 122.5 123.5 idx aggr2 115.9 116.4 115.3 116.4 123.2 117.8 109.5 130.9 116.6 109.9 idx aggr3 123.66 122.8 129.4 131.2 121.0 128.0 123.4 121.6 125.1 126.9 idx aggr4 119.4 120.5 123.2 123.5 121.4 123.7 113.9 120.5 123.3 120.3 idx aggr5 124.3 123.3 123.4 124.7 130.8 124.7 114.2 127.2 124.53 120.6 idx aggr6 117.4 116.9 119.2 120.6 116.1 116.4 117.6 115.8 121.0 119.4 The composition of the group expenditures in our dataset is as follows. Group expenditure 1: Food (ag6sp 1) • Bread, Cereals and Pasta (ag6sp 1 1) • Meat, Fish and milk derivates (ag6sp 1 2) • Fruit and Vegetables (ag6sp 1 3) • Fats and Oils, Sugar, Alcoholic and Nonalcoholic Drinks and Beverages, Tobacco (ag6sp 1 4) Group expenditure 2: Clothing (ag6sp 2) • Non Assignable Clothing (ag6sp 2 1) • Clothing and footware: man (ag6sp 2 2) • Clothing and footware: woman (ag6sp 2 3) • Clothing and footware: children (ag6sp 2 4) Group expenditure 3: Housing (ag6sp 3) • Rents and Condominium Fees (ag6sp 3 1) • Water, Energy and Heating (ag6sp 3 2) 12 Estimation of Unit Values • Home Repairs and Large Electrical Appliances (ag6sp 3 3) • Small Electrical Appliances and Flatware (ag6sp 3 4) Group expenditure 4: Transport and Communications (ag6sp 4) • Private Transportation (fuels, repairs) (ag6sp 4 1) • Public Transportation (ag6sp 4 2) • Telephone (ag6sp 4 3) • Purchase of Means of Transportation and Telephone (ag6sp 4 4) Group expenditure 5: Leisure and Education (ag6sp 5) • Education Expenditures (ag6sp 5 1) • Leisure (ag6sp 5 2) • Computer, Music, Televisions (ag6sp 5 3) • Other (ag6sp 5 4) Group expenditure 6: Health and Other no Food (ag6sp 6) • Medical Examinations, Medicines (ag6sp 6 1) • Insurance, Expenditures for Medical Assistance, Other (ag6sp 6 2) The dataset comprises the price indexes associated with each expenditure (idx aggr1 - idx aggr6) and the mean expenditures evaluated at the base year (1997) for each of the 6 selected expenditure categories conditioned by region, number of household members and month (mu ag6sp 1 - mu ag6sp 6). Other variables are residential location (urban or rural) location, number of household components (nc), macroarea (ripgeo), age of the household head (etacf), education of the household head titstucf, and the logarithm of the household total annual expenditure (lnx). Note that in the base year, average expenditures are computed by region and month in order to preserve the maximum territorial and time variation. 4.2 Examples We now implement the pseudounit command to estimate unit values for food, clothing, housing and transport and communication in order to illustrate how to use the options available in the command.8 8. Our results are obtained using STATA 14 and possible marginal differences may be due to previous versions of STATA adopting a different pseudorandom number generator. Menon, Perali and Tommasi 13 Estimation of unit values for the food expenditure group: food (ag6sp 1) (sub-group expenditures are: bread, cereals and pasta (ag6sp 1 1), meat, fish, milk and oteir protein (ag6sp 1 2), fruits and vegetables (ag6sp 1 3), fats and oils, sugar, beverage and tobacco (ag6sp 1 4)). The variables used for the imputation of zero expenditures are residential location location, macroarea ripgeo, number of household components nc, age of the household head etacf, education of the househod head titstucf, and the logarithm of the total annual household expenditure lnx. The multiple imputation of the zero expenditure shares generates 30 datasets that are then summarized using the mean as default. The regional price index is idx aggr1 reg and the mean expenditure computed at the base year for food is mu ag6sp 1. The variable associated with the unit values of the food expenditure is lwbp aggr1: . use pseudounit_cmd.dta, clear . . pseudounit ag6sp_1 ag6sp_1_1 ag6sp_1_2 ag6sp_1_3 ag6sp_1_4 if year==2007, /// > impvars(location i.ripgeo nc etacf i.titstucf lnx) /// > pindex(idx_aggr1) expby(mu_ag6sp_1) gen(lwbp_aggr1) /// > add(30) seed(889922) coll_rule(median) DESCRIPTIVES STATISTICS Variable | Obs. Mean Median Std. Dev. Min Max --------------------------------------------------------------------------------------------PUV(D) | 4971 .9405628 .9557249 .0908108 .3564937 1.09878 PUV(DI) | 4971 1.173586 1.189731 .1196185 .4406378 1.478698 lwbp_aggr1 | 4971 516.1335 517.3575 151.9128 127.4489 1015.579 --------------------------------------------------------------------------------------------Note: lwbp_aggr1 is Pseudo Unit Values in Levels PUV(DIL) In this case, there are no imputations because there are no zero share expenditures. Estimation of unit values for the clothing expenditure group: Clothing (sub-group expenditures are: Non Assignable Clothing, Clothing and Footware for Men, Clothing and Footware for Women; Clothing and Footware for Children). . pseudounit ag6sp_2 ag6sp_2_? if year==2007, /// > impvars(location i.ripgeo nc etacf i.titstucf lnx) pindex(idx_aggr2) /// > expby(mu_ag6sp_2) gen(lwbp_aggr2) /// > add(30) seed (889922) coll_rule(median) **** EXPENDITURE ag6sp_2_2 **** 644 observations to impute MULTIPLE IMPUTATION OVERVIEW Method: truncreg regression Limit: lower = 0 upper = 1 Total Observations: 4971 Complete observations: 4327 Missing observations: 644 Imputed observations: 644 0 values for expenditure ag6sp_2_2 converted to 1 14 Estimation of Unit Values **** EXPENDITURE ag6sp_2_3 **** 355 observations to impute MULTIPLE IMPUTATION OVERVIEW Method: truncreg regression Limit: lower = 0 upper = 1 Total Observations: 4971 Complete observations: 4616 Missing observations: 355 Imputed observations: 355 0 values for expenditure ag6sp_2_3 converted to 1 **** EXPENDITURE ag6sp_2_4 **** 3149 observations to impute MULTIPLE IMPUTATION OVERVIEW Method: truncreg regression Limit: lower = 0 upper = 1 Total Observations: 4971 Complete observations: 1822 Missing observations: 3149 Imputed observations: 3149 0 values for expenditure ag6sp_2_4 converted to 1 DESCRIPTIVES STATISTICS Variable | Obs. Mean Median Std. Dev. Min Max --------------------------------------------------------------------------------------------PUV(D) | 4971 1.049126 1.051407 .0675399 .4734136 1.181265 PUV(DI) | 4971 1.25613 1.259243 .1018521 .6206452 1.556212 lwbp_aggr2 | 4971 174.4161 177.3384 67.20049 24.84464 555.8301 --------------------------------------------------------------------------------------------Note: lwbp_aggr2 is Pseudo Unit Values in Levels PUV(DIL) In this case, there are no imputations for the sub-group expenditure ag6sp 2 1, but there are imputations for the sub-group expenditures ag6sp 2 2, ag6sp 2 3 and ag6sp 2 4. Estimation of unit values for the housing expenditure group: Housing (sub-group expenditures are: Rents and Condo Expenses; Water, Energy and Heating; Home repairs and Large Electrical Appliances; Small Electrical Appliances and Flatware). The variable lwbp aggr3 is created for each year year. . pseudounit ag6sp_3 ag6sp_3_?, /// > impvars(location i.ripgeo nc etacf i.titstucf lnx) pindex(idx_aggr3) /// > expby(mu_ag6sp_3) gen(lwbp_aggr3) year(year) **** EXPENDITURE ag6sp_3_3 **** 1668 observations to impute MULTIPLE IMPUTATION OVERVIEW Method: truncreg regression Limit: lower = 0 upper = 1 Total Observations: 9859 Menon, Perali and Tommasi 15 Complete observations: 8191 Missing observations: 1668 Imputed observations: 1668 0 values for expenditure ag6sp_3_3 converted to 1 DESCRIPTIVES STATISTICS Variable | Obs. Mean Median Std. Dev. Min Max --------------------------------------------------------------------------------------------2007 | PUV(D) | 4971 .9480497 .931581 .1956049 .4917437 1.509897 PUV(DI) | 4971 1.196825 1.177358 .2483218 .6092809 1.940222 lwbp_aggr3 | 4971 729.4016 698.4557 248.5652 160.168 1894.283 --------------------------------------------------------------------------------------------2008 | PUV(D) | 4888 .9482172 .9302261 .1977599 .4697024 1.527056 PUV(DI) | 4888 1.196471 1.173393 .2506707 .5757318 1.938348 lwbp_aggr3 | 4888 723.3252 692.9372 246.5752 187.4321 1780.587 --------------------------------------------------------------------------------------------Note: lwbp_aggr3 is Pseudo Unit Values in Levels PUV(DIL) Estimation of unit values for the transport and communications expenditure group: Transport and Communications (sub-group expenditures are: Private and Public Transportation, Telephone and Purchase of Transportation Means) and associated graph. . pseudounit ag6sp_4 ag6sp_4_? if year==2008, /// > impvars(location i.ripgeo nc etacf i.titstucf lnx) pindex(idx_aggr4) /// > expby(mu_ag6sp_4) gen(lwbp_aggr4) seed(889922) coll_rule(median) /// > sav(kd_sp4, replace) **** EXPENDITURE ag6sp_4_1 **** 907 observations to impute MULTIPLE IMPUTATION OVERVIEW Method: truncreg regression Limit: lower = 0 upper = 1 Total Observations: 4888 Complete observations: 3981 Missing observations: 907 Imputed observations: 907 0 values for expenditure ag6sp_4_1 converted to 1 **** EXPENDITURE ag6sp_4_2 **** 379 observations to impute MULTIPLE IMPUTATION OVERVIEW Method: truncreg regression Limit: lower = 0 upper = 1 Total Observations: 4888 Complete observations: 4509 Missing observations: 379 Imputed observations: 379 0 values for expenditure ag6sp_4_2 converted to 1 16 Estimation of Unit Values **** EXPENDITURE ag6sp_4_3 **** 60 observations to impute MULTIPLE IMPUTATION OVERVIEW Pay attention: imputation method switched to pmm Method: pmm regression Total Observations: 4888 Complete observations: 4828 Missing observations: 60 Imputed observations: 60 Number of k nearest neighbors: 241 0 values for expenditure ag6sp_4_3 converted to 1 DESCRIPTIVES STATISTICS Variable | Obs. Mean Median Std. Dev. Min Max --------------------------------------------------------------------------------------------PUV(D) | 4888 .8136841 .8143076 .1388408 .4111537 1.281451 PUV(DI) | 4888 .9933574 .9960147 .169638 .5019528 1.561017 lwbp_aggr4 | 4888 360.8287 319.7612 223.4753 10.74425 2049.023 --------------------------------------------------------------------------------------------Note: lwbp_aggr4 is Pseudo Unit Values in Levels PUV(DIL) (file kd_sp4.gph saved) � ����� �������������� ���� ����� ���� Note that the multiple imputation procedure using the truncreg method for the subgroup expenditure ag6sp 4 3 failed and the program switched to the predictive mean matching method (pmm). � ��� ���� ���������� ���� Figure 1: lwbp aggr4 Kernel Density Estimation ���� Menon, Perali and Tommasi 5 17 Conclusions The main objective of the pseudounit Stata module presented here is to make household budget surveys that collect only information about expenditures suitable also for demand and welfare analysis. Thanks to the pseudounit command, the lack of information about quantities no longer precludes the possibility of deriving household specific prices (unit values) and of estimating complete demand systems suitable for welfare analysis. 6 Acknowledgments The authors wish to thank an anonymous referee and Lucia Echeverrı̀a for helpful comments and suggestions. Any errors and omissions are the sole responsibility of the authors. 7 References Atella, V., M. Menon, and F. Perali (2003): “Estimation of Unit Values in Cross Sections without Quantity Information,” In Household Behaviour, Equivalence Scales, Welfare and Poverty, edited by C. Dagum and G. Ferrari. New York: Physica-Verlag. Barten, A. P. (1964): “Family Composition, Prices and Expenditure Patterns,” in Econometric Analysisfor National Economic Planning: 16th Symposium of the Colston Society, eds. P. Hart, G. Mills, and J. K. Whitaker, London: Butterworth. Berges, M., Pace Guerrero, I., and L. Echeverrı̀a (2012): “La Utilizaciòn de Precios Implı̀citos o de Pseudo Precios Implı̀citos en la Estimaciòn de un Sistema de Demandas QUAIDS para Alimentos,” Nülan. Deposited Documents 1675, Centro de Documentaciòn, Facultad de Ciencias Econòmicas y Sociales, Universidad Nacional de Mar del Plata. Coondoo, D., A. Majumder, and R. Ray (2004): “On a Method of Calculating Regional Price Differentials with Illustrative Evidence from India,” Review of Income and Wealth, 50(1): 51–68. Dagsvik, J. K., and L. Brubakk (1998): “Price Indexes for Elementary Aggregates Derived from Behavioral Assumptions,” Discussion Papers No. 234, Statistics Norway, Research Department. Deaton, A. (1987): “Estimation of Own- and Cross-price Elasticities from Household Survey Data,” Journal of Econometrics, 36(1-2): 7–30. Deaton, A. (1988a): “Household Survey Data and Pricing Policies in Developing Countries,” World Bank Economic Review, 3(2): 183–210. 18 Estimation of Unit Values Deaton, A. (1988b): “Quality, Quantity, and Spatial Variation of Prices,” American Economic Review, 78(3): 418–30. Deaton, A. (1990): “Prices Elasticities from Survey Data - Extensions and Indonesian Results,” Journal of Econometrics, 44(3): 218–309. Deaton, A. (1998): “Getting Prices Right: What Should Be Done?,” Journal of Economic Perspectives, 12(1): 37–46. Deaton, A., and J. Muellbauer (1980): “Economics and Consumer Behavior,” Cambridge University Press. Frisch, R. (1959): “A Complete Scheme for Computing All Direct and Cross Demand Elasticities in a Model with Many Sectors,” Econometrica, 27(2): 177–196. Hoderlein, S. and S. Mihaleva (2008): “Increasing the Price Variation in a Repeated Cross Section,” Journal of Econometrics 147(2): 316–325. Lahatte, A., R. Miquel, F. Laisney, and I. Preston (1998): “Demand Systems with Unit Values: A Comparison of Two Specifications,” Economics Letters, 58(8): 281–290. Lecocq, S. and J.M. Robin (2015): “Estimating Almost-Ideal Demand Systems with Endogenous Regressors,” Stata Journal, 15(2): 554–573. Lewbel, A. (1989): “Identification and Estimation of Equivalence Scales under Weak Separability,” Review of Economic Studies, 56(2): 311–316. Menon, M. and F. Perali (2010): “Econometric Identification of the Cost of Maintaining a Child,” Research on Economic Inequality, 18: 219–256. Perali, F. (1999): “Stima delle Scale di Equivalenza utilizzando i Bilanci Familiari ISTAT 1985-1994,” Rivista Internazionale di Studi Sociali, 67–129. Perali, F. (2000): “Analisi di Tassazione Ottimale Applicata al Consumo di Bevande,” in F. Perali (ed.), Microeconomia Applicata. Volume I, Carocci Editore, Roma. Perali, F. (2003): The Behavioral and Welfare Analysis of Consumption, Kluwer Academic Publishers, Amsterdam. Slesnick, D. (1998): “Empirical Approaches to the Measurement of Welfare,” Journal of Economic Literature, 36(4): 2108–2165. About the authors Martina Menon is an assistant professor of economics at the University of Verona in Italy. Her main research interests are economics of the family, consumption analysis, welfare analysis, and applied econometrics. Menon, Perali and Tommasi 19 Federico Perali is a full professor of economics at the University of Verona in Italy. His main research interests are economics of the family, consumption analysis, welfare analysis, and applied econometrics. Nicola Tommasi is a research assistant at the University of Verona in Italy. His main research interests are applied econometrics and statistical programming.