Academia.eduAcademia.edu

Assessing Alternative Poverty Proxy Methods in Rural Vietnam

2011, Oxford Development Studies

This paper compares and contrasts the use of four 'short-cut' methods for identifying poor households: (i) the poverty probability method; (ii) OLS regressions; (iii) principal components analysis; and, (iv) quantile regressions. After evaluating these four methods using two alternative criteria (total and balanced poverty accuracy) and representative household survey data from rural Vietnam, we conclude that the poverty probability method─which can correctly identify around four-fifths of poor and non-poor households─ is the most accurate 'short-cut' method for measuring poverty for specific sub-populations, or in years when household surveys are not available. We then test the performance of the poverty probability method with different poverty lines and using an alternative household survey, and find it to be robust.

Oxford Development Studies rR ee rP Fo ! # $ % & &' ( w ie ev " ly On URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 2 of 37 ASSESSING ALTERNATIVE POVERTY PROXY METHODS IN RURAL VIETNAM 20 December 2010 Abstract Fo This paper compares and contrasts the use of four ‘short-cut’ methods for identifying poor households: (i) the poverty probability method; (ii) OLS regressions; (iii) principal components analysis; and, (iv) quantile regressions. After evaluating these four methods using two alternative criteria (total and balanced poverty accuracy) and representative household survey data from rural Vietnam, we conclude that the poverty probability method─which can correctly identify around four-fifths of poor and non-poor households─ is the most accurate ‘short-cut’ method for measuring poverty for specific sub-populations, or in years when household surveys are not available. We then test the performance of the poverty probability method with different poverty lines and using an alternative household survey, and find it to be robust. iew ev rR ee rP ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 1 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies I. Introduction In most developing countries, it is only feasible to conduct detailed household surveys every few using relatively small samples of households. The results of these surveys can usually only be disaggregated to the regional or provincial level, and cannot be disaggregated for many population groups that are of interest to policy makers (for example, specific occupations or ethnic groupings). However, government and donor agencies often require that poverty should be monitored on an annual basis for specific administrative or project areas, or require projects demonstrate their impact on specific groups or occupations. Poverty measurement using household surveys is also difficult, expensive and time consuming, requiring detailed information is collected on all the different components of household expenditures and/or incomes. Short-cut methods for measuring monetary poverty in specific areas or sub-populations have therefore been devised for around 30 developing countries, most noticeable by the Grameen Foundation and USAID Poverty Assessment Tools project.1 Typically these methods use 10 to 20 easily verifiable indicators to obtain an index or score that is highly correlated with household poverty status. Using these short-cut methods, non-specialists can collect data for each household in the field in ten to fifteen minutes which, when combined with the coefficients from models estimated with nationally representative household survey data,can provide a reasonably accurate prediction of household’s poverty status. However, there have been few attempts to systematically compare such methods (especially using out-of-sample predictions). Fo rP This paper compares and contrasts the use of four ‘short-cut’ methods for measuring monetary poverty in rural Vietnam. These three methods, which we shall hereafter describe collectively as poverty proxy methods, are: (i) the poverty probability method; (ii) OLS regressions; (iii) principal components analysis and (iv) quantile regression. Each of these poverty proxy methods have been used in the past in Vietnam using different datasets and poverty lines (see Section II), but to date there has been no study which compares the accuracy of these different methods using the same data set, and few which have compare their out-of-sample predictive power using different data sets. Accordingly, this study uses the 2006 Vietnamese Household Living Standards Survey (VHLSS 2006) to test these four methods for rural households using a common international poverty line ($1.25/day in 2005 PPP terms). After evaluating these four methods using two alternative criteria (total and balanced poverty accuracy, which are explained below), we also test the models’ performance with different poverty lines and its out-of-sample performance using an alternative household survey (the VHLSS of 2004). We conclude that the poverty probability method is the most accurate ‘short-cut’ method for measuring poverty for specific sub-populations of interest, or in years when representative household surveys are not available. II. Literature Review iew ev rR ee This section provides a brief overview of six previous applications of poverty proxy methods in Vietnam in approximate chronological order.2 While two of these studies have been developed independently by Vietnam-based researchers, the remaining four are part of larger cross-country efforts to development ‘short-cut’ poverty assessment for various development organisations. 1 2 See www.microfinance.org/#Poverty_Scoring and www.povertytools.org This section draws on Chen and Schreiner (2009). ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 3 of 37 2 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 4 of 37 2.1. Baulch (2002) In the earliest known application of poverty proxy methods in Vietnam, Baulch (2002) constructed two composite poverty indices using the national poverty line of 4,904 Dong/person/day and the Vietnam Living Standards Survey (VLSS) 1997-98. Baulch used a combination of Receiver Operating Characteristic (ROC) curve technique to assess and stepwise probits to build his poverty indices, which contains six indicators for urban areas and twelve indicators for rural area. He assesses the poverty accuracy of this method but did not validate his results using a different dataset. 2.2. Sahn and Stifel (2003) As part of a larger cross-country study involving LSMS type data from ten developing countries, Sahn and Stifel (2003) used factor analysis and the 1992/3 and 1997/8 VLSS to construct an “asset index” for Vietnam. The indicators used include ownership of consumer durables, residence quality and education of the household head. Sahn and Stifel (2003) did not test their asset index on other datasets. Moreover, their study did not indicate its poverty accuracy, i.e. its accuracy in correctly identifying and targeting the poor. Fo 2.3. Gwatkin et al. (2007) rP Gwatkin et al. (2007) used principal components analysis to create a “wealth index” for the 7,048 households in the 2002 Vietnam Demographic and Health Survey. This was part of a wider World Bank sponsored project to produce wealth indices for 56 developing and transition economies. In all these study, poverty is defined in relative, rather than absolute terms. Gwatkin et al. construct a “wealth index” for Vietnam using 18 indicators. Principal components analysis (PCA) is used to generate a weight for each household item with available information. The wealth index score is then calculated for each household by weighting the response with respect to each item pertaining to that household by the coefficient of the first principal component and summing the results. Their wealth index is standardized in relation to a standard normal distribution with a mean of zero and a standard deviation of one. rR ee While powerful and relatively easy to calculate, it is difficult to use the wealth index to estimate poverty rates at the household or individual level. Furthermore, its accuracy was not tested in Gwatkin et al. (2007) and they also did not validate their wealth index using a different dataset. iew 2.4. IRIS Center (2007) ev USAID commissioned the IRIS Center at the University of Maryland (IRIS 2007) to build a poverty scorecard for Vietnam along with 28 other developing countries as part of its Poverty Assessment Tools project (www.povertytools.org). IRIS (2007) considers only USAID’s “extreme” poverty line (equivalent to VND 3,818 /person/day in January 1999 prices) and used VLSS 1997/8 data for its analysis. IRIS use 17 indicators including household size, household head’s age, ownership of motorcycle etc. From these variables, IRIS calculated poverty scores using four different methods: OLS, quantile regression, linear probability and probit and use the “Balanced Poverty Accuracy Criterion” (BPAC), which USAID have since adopted and is explained below, to evaluate these methods. After comparing these four models, IRIS recommend the use of quantile regressions for determining the poverty status of households in Vietnam. Using the USAID “extreme” line and the 1997/8 VLSS, the IRIS method produces a BPAC is 61.7. The IRIS Center also did not validate their results using a different dataset. ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 3 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies 2.5. Linh Nguyen (2007) In a paper for the Asian Development Bank, Linh Nguyen (2007) uses multiple regression techniques to assess poverty using the VHLSS 2002 data. This technique detects variables or predictors that are correlated with a household’s consumption expenditure and consequently, its poverty status. She used bivariate and multivariate analysis to narrow down the number of variables from an initial list of 60 variables to 22 indicators in rural and 15 indicators in urban areas. Linh Nguyen (2007) validated her results using the VLSS 1998 data and a subset of the VHLSS 2002 (for Thanh Hoa and Nghe An provinces). 2.6 Chen and Schreiner (2009) Schreiner and colleagues have developed poverty scorecards for the Grameen Foundation in 28 developing countries (www.microfinance.com/#Poverty_Scoring). Chen and Schreiner (2009) develop a simple poverty “scorecard” for Vietnam with 10 indicators selected from an initial list of 150 indicators drawn from the VHLSS 2006. Each indicator is first screened with an entropybased “uncertainty coefficient” that measures how well each indicator predicts poverty on its own. Their final indicator selection uses both judgement and statistics (a forward stepwise logit). The final scorecard is built using a PPP $1.75/day poverty line and a logit regression.3 One advantage of Chen and Schreiner (2009) method is their validation of the scorecard using the VHLSS 2004. However, its performance is not compared to those of other methods. Fo rP Appendix A1 summarises and compares the different indicators that were used to predict poverty in each of these studies, and compares them with those proposed in this paper. ee It should be noted that four of the six of these poverty proxy methods have an explicit focus on monetary poverty (identified according to whether a household’s per capita expenditure is above or below a pre-determined absolute poverty line) while the other two methods concern asset poverty. None of the methods consider the wider non-monetary dimensions of poverty that are considered in, for example, the UNDP’s Multidimensional Poverty Index (Alkire and Santos, 2010). While focusing on monetary poverty is obviously restrictive, it does reflect the principal way in which poverty is measured in Vietnam (and many other countries). III. Data and Methods ev rR We used data from the VHLSS 2006, the most recent available national income and expenditure survey in Vietnam. The data cover over 45,000 households in rural and urban areas. It includes information on household income, assets, expenditure4 and other socio-economic dimensions. Using the VHLSS06 data, we compare the results of four poverty proxy approaches. In addition, we used the VHLSS 2004 and the Thanh Hoa Resurvey data for validation of estimates of poverty rates. iew There are two “official” poverty lines in Vietnam. The General Statistical Office (GSO) defines a food poverty line based on the expenditure required to obtain 2100 calories per person per day. Based on the food poverty line, the national poverty lines are then defined as the food poverty lines plus non-food expenditure by a reference group with food expenditure close to the food poverty line. The GSO’s poverty line is equivalent to VND 7,011/person/day at January, 2006 prices. The GSO’s poverty line is, however, based on a food basket which was first estimated in 3 ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 5 of 37 Chen and Schreiner justify the use of a PPP $ 1.75/day poverty line by saying that it is close to the national poverty line. 4 The expenditure data are collected from a subsample of just over 9,000 households. 4 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 6 of 37 1993, and has only been updated by inflating its food and on-food components by the relevant price indices. An alternate set of poverty lines are set by Ministry of Labour, Invalids, and Social Affairs (MOLISA) for 2006–2010 as VND 6,575/person/day for rural areas and VND 8,548/person/day for urban areas (Chen and Schreiner 2009). The MOLISA poverty lines are administratively determined and updated periodically to reflect changes in both the cost of living and living standards. In contrast to the General Statistics Office, MOLISA’s poverty lines are based on per capita incomes. At the present time, there is an ongoing debate about the updating of the MOLISA poverty lines for the 2011 to 2015 period. Because of the dated nature of both of the GSO and MOLISA poverty lines, the poverty lines used in our analysis are the international poverty lines of PPP $1.25 and $2.00 per person per day. These lines were calculated by the World Bank using household survey data from 116 countries together with the results of the 2005 International Comparisons Project (Ravallion et al., 2008). In Vietnamese Dong, the $1.25/day line is equivalent to VND 242,250/person/month while the $2/day line is VND 387,600/person/month, in January, 2006 prices. These are the poverty lines which most international and bilateral donors use for monitoring the MDGs. Those with incomes (or expenditures) of less than PPP $1.25/day are usually regarded as extremely poor, while those living between PPP $1.25 and $2/day as moderately poor. Fo We use two criteria to assess the accuracy in predicting poverty. The first criterion is total accuracy, i.e. weighted average of poverty accuracy and non-poverty accuracy. It is calculated by the following formula: rP Total accuracy= Headcount index × Poverty accuracy+ (1- Headcount index) × Non-poor accuracy. (1) ee Thus total accuracy, which will always vary between 0 and 100, shows the percentage of people correctly identified as poor and non-poor. The second criterion is BPAC index, adopted by USDA in its poverty assessment. The BPAC index is calculated by the following formula rR BPAC= (Inclusion – |Under-coverage – Leakage|) x [100 ÷ (Inclusion + Under-coverage)] (2) in which, Under-coverage= the “true” poor incorrectly predicted as non-poor, expressed as a percentage of total “true” poor; Leakage = “true” non-poor incorrectly predicted as poor, expressed as a percentage of total “true” poor; Inclusion= the “true” poor correctly predicted as poor, expressed as a percentage of total “true” poor. In other words, BPAC is the poverty accuracy minus the difference between under-coverage and leakage expressed as percentages of total “true” poor. Note that unlike, Total Accuracy, BPAC can take negative values when the absolute difference between under-coverage and leakage exceeds poverty accuracy. iew ev In line with Prosperity Initiative’s goal of reducing poverty at scale (that is having systemic impacts on poverty reduction that extend beyond the communities in which the organisation is working) our preferred criterion is the BPAC. As Total Accuracy combines accurate identification of both poor and non-poor, this measure is only useful if one is interested in an aggregate assessment of poverty status without wanting to target the poor specifically. Indeed, in some cases, a proxy method with high Total Accuracy can give a highly inaccurate identification of poor people. For example in Table 5, at the cut-off point of 0.5, Total Accuracy is the highest (82.74) but only 38.1 percent of the poor are correctly identified. So for this reason, we focus on the BPAC in assessing different poverty proxy models. ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies We also employ Receiving Operating Characteristic (ROC) curves to show the accuracy of different poverty proxy methods. ROC curves are diagrams which portray the ability of different 5 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] non-parametric Page 7 of 37 Oxford Development Studies diagnostics tests to distinguish between a binary outcome, and were originally developed for use in electrical engineering and signal processing (Baulch, 2002; Wodon, 1997). A ROC curve shows the ability of a test to distinguish between two states or conditions. In poverty analysis, ROC curves plot the probability of a test correctly identifying a poor person as poor (which is called the test’s “sensitivity”) on the vertical axis against one minus the probability of the same test correctly classifying a non-poor person as non-poor on the horizontal axis (which is called the test’s “specificity”). Typically, ROC curves are concave and embody a trade-off between coverage of the poor and inclusion of the non-poor (see Figures 1 to 3 below). As long as an indicator or index increases in value as the likelihood of poverty increases, then the area under an ROC curve—which will always vary between zero and one—can be used for ranking their relative efficacy as poverty proxies. An ROC curve with an area of 0.5 will lie mostly below the leading diagonal. IV. Constructing poverty proxies for rural Vietnam Fo 1. Poverty indicators In order to assess poverty, we use three alternative poverty proxy methods: the poverty probability (probit), OLS regression, and principal component analysis (PCA). As shown in Section 2, these are the three most commonly used methods in poverty proxy studies in Vietnam (as well as other developing countries). After comparing the accuracy of these methods in identifying the poor and non-poor in rural Vietnam, we then select our preferred model. rP At the first step, we collect 48 potential poverty indicators at household level5 in the following categories: ee - Household characteristics (such as household size, share of female members, share of children) - Education indicators (such as household head’s education level, spouse’s education level). - Housing indicators (such as type of the main residence, type of toilet). - Asset indicators (ownership of durable goods such as motorcycle, bicycle, radio). - Agriculture and land variables (such as whether the household grows crops, annual crop areas, total area, irrigated area). ev rR The list of candidate indicators is presented in Table 1, categorized by poverty status (based on the absolute international poverty line of PPP $1.25). iew 5 ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 We do not use commune or village level information as our aim is to construct a quick-and-easy method for predicting a household’s poverty status. 6 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 8 of 37 Table 1: Mean values of Candidate Poverty Indicators Housing Type Poor Non-poor Living area Own house Villa or house with private bathroom/kitchen House with shared bathroom or kitchen Garden Semi-permanent house Drinking water from private tab Flush toilet Double-vault toilet Electricity Daily water from private tab Daily water from well Have land for agricultural purpose Irrigated area Annual crop area Household size Total land area Head's age Share of under 15-year old members Share of female members Share of members aged 15-59 years Head is illiterate Head finishing primary school Head finishing secondary school Head finishing high school and above Spouse finishing primary school Spouse finishing secondary school Spouse finishing high school and above Minority Crop cultivation Number of wage earners Number of household members with farm jobs Number of household members with non-farm self-employment Continuous Binary Binary 50.19 0.97 0 62.41 0.98 0.04 Binary Binary Binary Binary Binary Binary Binary Binary Binary Binary Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Binary Binary Binary Binary Binary Binary Binary Binary Binary Integer Integer 0.06 0.2 0.62 0.03 0.06 0.3 0.87 0.04 0.63 0.92 0.27 0.51 4.77 0.84 48.43 0.30 0.54 0.53 0.02 0.26 0.19 0.04 0.20 0.15 0.02 0.39 0.89 0.78 2.39 0.14 0.26 0.64 0.08 0.27 0.39 0.95 0.08 0.72 0.85 0.46 0.47 4.22 0.89 49.32 0.21 0.51 0.66 0.02 0.27 0.3 0.12 0.24 0.23 0.08 0.13 0.8 0.99 1.9 Fo iew 0.25 On Integer ev rR ee rP 0.55 ly 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 7 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 9 of 37 Oxford Development Studies Ownership of assets and durable goods Computer Binary Radio Binary Television Binary Video cassette Binary Stereo Binary Refrigerator/freezer Binary Laundry machine Binary Electric fan Binary Gas cooker Binary Rice cooker Binary Wardrobe Binary Bicycle Binary Motorbike Binary Fixed telephone Binary Mobile telephone Binary Pump Binary Cattle Binary Breeding facilities Binary 0 0.09 0.6 0.19 0.04 0.01 0 0.61 0.04 0.24 0.51 0.56 0.25 0.02 0.01 0.12 0.54 0.43 Fo 0.03 0.12 0.86 0.44 0.14 0.13 0.03 0.82 0.3 0.59 0.82 0.67 0.52 0.21 0.1 0.29 0.29 0.51 2. Method 1: Poverty probability method rR ee rP This method uses a probit model to identify the probability of a household being poor. First, a stepwise probit is run to remove six variables out of the 48 candidate variables that do not predict poverty well. The remaining 42 variables are then ranked according to their accuracy in identifying the poor alone using the area under Receiver Operating Characteristics (ROC) curve. The greater the area under a ROC curve, the better is the indicator in identifying poverty. ev Using this list of 42 variables ranked by ROC area, we estimate two models: one is more expansive and the other more parsimonious. See Appendices A2 and A3 for the poverty proxy checklists that would be used to apply the two models. Model 1 iew From the list of 42 variables, we selected 34 variables based on both our judgment6 and the ROC area. We then re-ran the probit model taking account of the clustering and stratification in the VHLSS survey design to calculate coefficient standard errors. This allowed six variables that have low coefficients in the probit model to be removed. Our final list includes 25 indicators (excluding regional dummies). These include 11 indicators household (HH) characteristics, five housing characteristics indicators and nine types of assets. 6 ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For practical purpose, we drop those indicators (such as irrigated land area and crop land area) that would be difficult to collect information on in a short interview, or which are susceptible to measurement errors. 8 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 10 of 37 Table 2 presents the accuracy of these indicators in identifying the poor in rural Vietnam in terms of the area under the ROC curve for each variable. Recall that the higher is the area under an ROC curve, the better the variable underlying it is in distinguishing between the poor and nonpoor. Recall that the maximum value which the area under an ROC curve is 1, and that values less than 0.5 will generally lie below the leading diagonal. Indicators with areas under the ROC curve that are significantly greater than 0.5 can be viewed as useful poverty proxies, while areas substantially less than 0.5 may be regarded as indicators of non-poverty. Table 2: Accuracy of different indicators in identifying the poor in Vietnam Indicators Type Area under ROC curve Household size HH characteristics 0.605 Share of children HH characteristics 0.642 Share of working HH characteristics 0.363 Share of female HH characteristics 0.536 Head finishing primary school HH characteristics 0.499 Head finishing secondary school HH characteristics 0.457 Head finishing high school and above HH characteristics 0.459 Minority HH characteristics 0.635 Wage job HH characteristics 0.453 Non-farm self-employment HH characteristics 0.401 Semi-permanent house Housing 0.496 House with private bathroom/kitchen Housing 0.480 Electricity Housing 0.463 Flush toilet Housing 0.391 Double-vault toilet Housing 0.461 House with shared bathroom or kitchen Housing 0.458 Radio Assets 0.484 Mobile telephone Assets 0.447 Refrigerator/freezer Assets 0.434 Pump Assets 0.416 Fixed telephone Assets 0.401 Electric fan Assets 0.398 Television Assets 0.380 Video cassette Assets 0.372 Motorbike Assets 0.366 Fo iew ev rR ee rP Note on Indicators: Share of children: proportion of household members less than 15 years of age. Minority: 0= all ethnic groups except Kinh and Hoa; 1= Kinh or Hoa Housing indicators: binary variables indicating if the household has these durables/facilities. On The results of the probit model are presented in Table 3. Larger household size, a higher share of women or children, and a lower share of working members are all associated with higher probability of poverty. In contrast, households with non-farm wages or non-farm selfemployment have a lower probability of being poor. As expected, households whose heads belong to one of the ethnic minorities have higher probability of being poor, while the head’s education level has the opposite effect. Finally, better house type, better toilet type and the ly 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 9 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 11 of 37 Oxford Development Studies ownership of consumer durables and fixed assets are associated with lower probabilities of being poor. Fo iew ev rR ee rP ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 10 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 12 of 37 Table 3: Probit model for the composite poverty indicator (model 1) Std. Err. 0.01 0.06 0.05 0.05 0.02 0.01 0.04 0.03 0.03 0.05 0.05 0.11 0.03 0.06 0.04 0.04 0.03 0.08 0.06 0.03 0.05 0.03 0.03 0.03 0.03 0.04 0.07 0.06 0.04 0.08 rR ee rP t-statistic 21.30 12.85 4.19 -4.92 -12.64 -14.43 7.68 -6.55 -8.96 -9.46 -12.11 -6.14 -10.59 4.85 -3.94 -6.60 -3.61 -6.68 -5.92 -4.87 -7.45 -6.65 -13.51 -8.73 -15.99 -5.43 -4.81 -9.08 -16.93 -3.34 iew Coef. 0.17 0.74 0.23 -0.24 -0.25 -0.18 0.31 -0.18 -0.27 -0.43 -0.57 -0.68 -0.33 0.29 -0.14 -0.26 -0.10 -0.56 -0.37 -0.15 -0.35 -0.20 -0.35 -0.23 -0.40 -0.24 -0.32 -0.58 -0.75 -0.27 33745 121.74 0 ev Variables Household size Share of children Share of women Share of working people Non-farm self-employment Wage jobs Minority Head finishing primary school Head finishing secondary school Head finishing high school and above House with private bathroom/kitchen House with shared bathroom or kitchen Semi-permanent house Electricity Radio Flush toilet Double-vault toilet Mobile telephone Refrigerator/freezer Pump Fixed phone Electric fan Television Video cassette Motorbike North East Central Highlands South East Mekong River Delta Constant Number of obs F(29, 2201) Prob > F Fo Note: Some regions are removed from model because of the stepwise probit process On Figure 1 shows the ROC curve for the composite poverty indicator. As the cut-off used to distinguish the poor from the non-poor is increased, the proportion of the poor correctly identified as poor increases, along with the proportion of the non-poor who are incorrectly identified as poor. Thus the concavity of the ROC curve displays the usual trade-off between coverage of the poor and inclusion of the non-poor. The area under the ROC curve is 0.8403. This figure shows that there is a trade-off between coverage of the poor and exclusion of the nonpoor in rural areas. In general, the more accurately a method is in identifying the poor, the less accurately it will be in identifying the non-poor (and vice versa). ly 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 11 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies 0.00 Coverage of Poor (Sensitivity) 0.25 0.50 0.75 1.00 Figure 1: ROC curve for model 1. 0.25 0.50 0.75 Inclusion of Non-Poor (1 - Specificity) rP 0.00 Fo 1.00 Area under ROC curve = 0.8403 Model 2 ee In the model 2, we chose a more parsimonious list of 11 household-level indicators based on several criteria including their ease of collection, their ROC area, and their coefficients and statistical significance in explaining absolute income poverty. The final list includes 4 household characteristics (share of children, minority, household size, head finishing high school), 3 accommodation characteristics (house with private bathroom/kitchen, house with shared bathroom or kitchen, flush toilet) and 4 durable ownership variables (mobile phone, electric fan, television and motorbike). iew ev rR ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 13 of 37 12 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 14 of 37 Table 4: Probit model for the composite poverty indicator (model 2) Variables Share of children Minority Household size Head finishing high school and above House with private bathroom/kitchen House with shared bathroom or kitchen Flush toilet Mobile phone Electric fan Television Motorbike North East Central Highlands South East Mekong River Delta Constant Number of obs F(15, 2215) Prob > F Fo Coef. 1.05 0.44 0.10 -0.32 -0.49 -0.36 -0.40 -0.83 -0.25 -0.50 -0.50 -0.20 -0.24 -0.52 -0.62 -0.51 33745 190.26 0 Std. Err. 0.05 0.04 0.01 0.04 0.10 0.04 0.04 0.08 0.03 0.03 0.02 0.04 0.06 0.06 0.04 0.04 t-statistics 21.30 11.06 14.77 -7.94 -4.85 -9.82 -11.19 -10.32 -8.85 -19.15 -20.54 -4.48 -3.74 -8.83 -16.35 -12.04 ee rP Figure 2 shows the ROC curve for model 2. The ROC area is 0.8116, less than the ROC area in Model 1 (0.8403). Thus, Model 1 performs better than Model 2 in terms of ROC areas. 1.00 Figure 2: ROC area for model 2 0.00 Area under ROC curve = 0.8116 1.00 ly 0.25 0.50 0.75 Inclusion of Non- Poor (1 - Specificity) On 0.00 iew Coverage of Poor (Sensitivity) 0.25 0.50 0.75 ev rR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 13 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Table 5 shows the trade-off between correct coverage of the poor and exclusion of the non-poor in rural areas at different cut-off points. The cut-off points are the predicted probability scores from the probit models in Table 3 and Table 4. If a very low value for the cut-off (such as 0.05) is chosen, nearly all the households (97.3%) would be correctly identified as poor in model 1. However, at this cut-off, only 34.6% of the non-poor would be correctly identified as non-poor in mode 1. In contrast, if a very high value for the cut-off such as 0.95 is chosen, all non-poor households would be correctly identified as non-poor but only 1.11 percent of the poor households would be correctly identified. Thus, the choice of cut-off point would depend on the relative importance that policy-makers attaches to the two objectives: (a) coverage of the poor and (b) exclusion of the non-poor. In Table 5, the optimal cut-off points based on total accuracy (that is the proportion of all households who are correctly identify as poor or non-poor) are 0.40 for model 1 and 0.45 for model 2. At the cut-off point of 0.40, 52 percent of the poor and 90 percent of the non-poor are correctly identified in Model 1 and 45 percent of the poor and 91 percent of the non-poor are correctly identified in Model 2. Fo On the other hand, the optimal cut-off point based on BPAC (which give more weight to accurate identification of the poor) is 0.35 for both models. At this cut-off point, which is shown in bold in Table 5, 79.2 percent and 77.7 percent of the people are correctly identified in models 1 and 2, respectively. In addition, 59.2 percent of the poor and 86.8 percent of the non-poor are correctly identified in Model 1. For Model 2, 53.1 percent of the poor and 87.1 percent of the non-poor are correctly identified. rP Comparing the two models, it is clear that Model 1 performs better than Model 2 in terms of both poverty accuracy and total accuracy. Model 1 also performs better than Model 2 at almost all cutoff points in terms of BPAC. However, Model 2 has higher BPAC than Model 1 at the optimal cut-off point. Yet, Model 2 is more susceptible to the choosing of cut-off point. For example, moving from the cut-off point of 0.4 to 0.45 reduces BPAC by 60.2 percent in Model 1 and by 77.7 percent in Model 2. iew ev rR ee ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 15 of 37 14 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 16 of 37 Table 5: Accuracy of the Poverty Probability Method ----------------Model 1----------------------------- ----------------Model 2 ------------------Poverty NonTotal BPAC Poverty NonTotal BPAC accuracy poverty accuracy accuracy poverty accuracy accuracy accuracy 97.32 34.63 48.20 -136.53 97.54 26.68 42.02 -165.31 92.88 49.72 59.06 -81.93 92.99 43.52 54.23 -104.35 87.56 61.07 66.80 -40.87 85.96 57.30 63.50 -54.51 81.30 70.12 72.54 -8.10 77.28 68.36 70.29 -14.47 73.90 77.07 76.38 17.02 69.29 76.62 75.04 15.41 66.75 82.46 79.06 36.55 59.75 83.20 78.12 39.21 59.15 86.81 80.82 52.29 53.11 87.07 79.71 53.02 52.01 90.28 81.99 39.21 44.71 91.21 81.14 21.23 44.86 92.85 82.46 15.61 40.13 93.23 81.74 4.74 38.06 95.09 82.74 -6.09 32.13 95.70 81.93 -20.18 32.17 96.56 82.61 -23.20 27.55 96.73 81.75 -33.06 27.02 97.69 82.39 -37.61 21.59 97.98 81.44 -49.51 22.06 98.43 81.89 -50.19 16.69 98.60 80.87 -61.56 17.82 98.99 81.42 -60.71 13.43 99.16 80.60 -70.12 13.61 99.39 80.82 -70.58 8.57 99.57 79.87 -81.30 9.70 99.75 80.25 -79.69 6.49 99.76 79.57 -86.17 5.94 99.91 79.56 -87.78 3.23 99.90 78.97 -93.19 3.07 99.98 78.99 -93.80 1.15 99.96 78.56 -97.54 1.11 100.00 78.59 -97.78 0.25 100.00 78.40 -99.51 2. Method 2: OLS regression rR ee rP Cutoff point 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 Fo In this method, a stepwise OLS regression is run based on the list of candidate variables in Table 1. The dependent variable is the natural logarithm of per capita real household income in 2006 in rural Vietnam. After dropping 10 variables (including living area, total land area, and source of drinking water) that were not statistically different from zero at the 10% level have insignificant explanatory power, the results from OLS are presented in Table 6. iew ev ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 15 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Table 6: OLS regression of real per capita income 2006 Std. Err. 0.01 0.02 0.02 0.03 0.02 0.01 0.00 0.01 0.01 0.01 0.02 0.03 0.01 t-statistic -29.03 -5.42 7.92 -6.91 -6.09 13.50 9.80 5.96 7.30 9.79 3.46 5.04 6.52 0.10 0.04 0.16 0.11 0.11 0.10 0.14 0.08 0.07 0.04 0.21 0.17 0.17 0.03 -0.05 0.11 0.17 0.13 0.28 8.15 24815 295.9 0 0.46 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.02 0.01 0.01 0.02 0.03 0.02 0.02 0.08 7.56 3.42 13.46 10.74 8.51 8.52 16.22 9.33 8.15 3.68 13.98 4.83 11.08 3.64 -5.33 6.64 6.80 6.55 17.52 101.67 rR ee rP Coef. -0.39 -0.09 0.17 -0.20 -0.12 0.07 0.04 0.06 0.08 0.14 0.06 0.14 0.07 iew ev Household size Minority Share of working members Share of children Share of women Non-farm self employment Wage job Head finishing primary school Head finishing secondary school Head finishing high school and above Head’s age (logarithm) House with private bathroom/kitchen House with shared bathroom or kitchen Flush toilet Double-vault toilet Gas cooker Wardrobe Fixed phone Television Motorbike Video cassette Rice cooker Electric fan Mobile phone Laundry Refrigerator/freezer Pump Cattle North East Central Highlands South East Mekong River Delta Constant Number of obs F( 32, 2186) Prob > F R-squared Fo On From the OLS regression, it is possible to predict household per capita income. Then by comparing predicted per capita income with the poverty line, each household’s poverty status can be predicted. Table 7 shows the tabulation between predicted and actual poverty status using OLS regression and an absolute poverty line of $1.25/day. 36.8 percent of the poor and 95.7 percent of the non-poor are correctly identified using the absolute poverty line of $1.25 per day. ly 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 17 of 37 16 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 18 of 37 Table 7: Predicted and actual poverty using absolute poverty line (OLS regression) Actual non-poor Actual poor Poverty accuracy Total accuracy BPAC Predicted non-poor Predicted poor 4.29 95.71 63.32 36.68 36.68 83.49 48.82 The BPAC for Method 2 is equal to 48.82, lower than the corresponding figure for Method 1. For further comparison between Method 1 and Method 2, we estimate the probability of households being poor from the OLS regression. The probability of a household being poor is given as Pi * = Φ{ ln z − X i' β σ Fo ) where z is the poverty line ($1.25), Φ is the cumulative standard normal distribution and σ is the standard error of the residuals (Hentschel et al., 2000). Table 8 presents the accuracy in identifying poverty based on the poverty line of $1.25 and the estimated poverty probability. BPAC is maximized at the cut-off point of 0.35 (again shown in bold). At that point, 58 percent of the poor and 87.6 percent of the non-poor are correctly identified. rP Generally, the OLS method is quite good in identifying poverty. Another advantage of the OLS method over the probit models is that it can predict the incomes of particular households, thus enable calculate such income-based poverty statistics as poverty gap and poverty severity. However, the standard errors associated with such poverty measures at the household level are typically very large. iew ev rR ee ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 17 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Table 8: Poverty identification accuracy- OLS method Poverty accuracy Non- poverty accuracy Total accuracy BPAC 97.43 93.83 88.01 81.04 74.46 65.97 57.95 50.00 43.38 36.68 30.16 24.09 18.11 13.21 8.52 5.38 2.64 0.79 0.10 30.82 47.07 58.95 69.41 77.27 82.98 87.64 91.19 93.76 95.71 97.28 98.33 99.02 99.47 99.82 99.89 99.99 100.00 100.00 44.61 56.75 64.97 71.82 76.69 79.46 81.49 82.66 83.34 83.50 83.39 82.96 82.28 81.62 80.92 80.33 79.84 79.47 79.32 -165.07 -102.81 -57.28 -17.20 12.91 34.78 52.63 33.75 10.66 -10.21 -29.25 -45.42 -60.04 -71.54 -82.26 -88.83 -94.66 -98.41 -99.81 rR ee rP Cutoff points 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 Fo 3. Method 3: Principal Component Analysis The third method we use is principal component analysis (PCA). Principal component analysis is a technique for reducing the information contained in a large set of variables to a smaller number. The first principal component is the linear index of the underlying variables that captures the most variation among them (Filmer and Pritchett, 2001). The method has been applied extensively in the education and health literature in other countries (Filmer and Prichett, 2001; Rutstein and Rubin, 2004) and in several unpublished papers which estimate an “asset index” for Vietnamese households (Gwatkin et al. 2007, Chowdhuri and Baulch, 2010). iew ev For the sake of simplicity, we use the same set of variables as in Method 1(Model 1) for our PCA. Table 9 shows the factor scores associated with these variables. Generally, a variable with a positive factor score is associated with higher socio-economic status, while a variable with a negative factor score is associated with lower socio-economic status. Using the factor scores from the first principal components as the weights, we then construct an asset index for each household which has a mean equal to zero and a standard deviation equal to one. Table 10 shows the accuracy from this method, using percentiles of asset index as cut-off points. ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 19 of 37 18 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Sahn and Stiffel, 2003; Page 20 of 37 Table 9: Factor scores in principal component analysis (component 1) Variable Score Minority -0.194 Household size 0.032 Share of women -0.054 Share of working members 0.155 Share of children -0.074 Head finishing primary school -0.052 Head finishing secondary school 0.093 Head finishing high school 0.171 Wage job 0.019 Non-farm self-employment 0.188 Semi-permanent houses -0.025 House with shared bathroom or kitchen 0.126 House with private bathroom/kitchen 0.202 Double-vault toilet -0.070 Flush toilet 0.333 Radio 0.017 Electricity 0.175 Mobile phone 0.267 Refrigerator/ freezer 0.317 Pump 0.239 Fixed phone 0.346 Electric fan 0.251 Television 0.283 Video cassette 0.290 Motorbike 0.272 Eigen value of the 1st component 3.48 % of variation explained by the 1st component 13.9 Fo ev rR ee rP Table 10 shows that the PCA method performs less well than both the probit and the OLS method. The optimal cut-off point is 0.25, at which BPAC is 38 and total accuracy is 80 percent. One reason for the poor performance of PCA is that asset indices calculated by conventional PCA incorrectly treat categorical variables as if they were continuous variables (Kolenkiv and Angeles, 2008). Conventional PCA also does not take account of the number of each assets which a household possesses or the ordered nature of some (e.g., housing) variables. An alternative, more satisfactory method of estimating asset indices is polychoric PCA (Kolenikov and Angeles, 2008), although this method is not yet widely used in practice. iew ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 19 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 21 of 37 Oxford Development Studies Table 10: Accuracy of the PCA method Cut-off points Asset index Poverty accuracy Non-Poverty accuracy Total accuracy BPAC 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 -2.55 -2.02 -1.66 -1.36 -1.11 -0.89 -0.69 -0.49 -0.29 -0.10 0.11 0.33 0.58 0.84 1.17 1.59 2.12 2.83 3.83 14.39 26.58 37.11 46.28 54.56 61.89 68.10 73.89 78.51 82.63 86.40 89.36 92.17 94.63 96.31 97.70 98.77 99.42 99.88 97.59 94.58 91.11 87.26 83.16 78.81 74.14 69.36 64.26 59.02 53.68 48.11 42.51 36.81 30.88 24.89 18.80 12.60 6.35 79.58 79.86 79.41 78.39 76.96 75.15 72.83 70.34 67.34 64.13 60.76 57.04 53.26 49.33 45.05 40.66 36.12 31.40 26.60 -62.52 -27.23 6.39 38.65 39.06 23.34 6.45 -10.85 -29.32 -48.29 -67.61 -87.75 -108.03 -128.65 -150.08 -171.76 -193.79 -216.24 -238.87 Fo 4. Method 4: Quantile regression rR ee rP The fourth method we consider is quantile regression. This method is recommended by IRIS Center (2008) as the most suitable method in Vietnam using a poverty cut-off corresponding to the 50 percentile of the expenditure distribution. For comparability, we use the same set of variables in the quantile regressions as in Model 1 of the poverty probability model and the PCA. However, unlike the IRIS centre we ran the regression with the quantile approximating to the $1.25/day poverty line (0.22). 7 Table 11 reports results from the quantile regression at the 22nd percentile while Table 12 shows the accuracy of the method. iew ev 7 ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 We thank an anonymous reviewer for this suggestion. Note that we have also run this regression with the quantile corresponding to the median, and the results are similar to those with the 22nd percentile. 20 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 22 of 37 Table 11: Quantile regression Std. Err. -0.08 -0.27 -0.06 0.14 0.09 0.08 -0.12 0.06 0.09 0.18 0.27 0.20 0.14 -0.09 0.06 0.13 0.05 0.23 0.17 0.06 0.14 0.08 0.14 0.09 0.16 0.11 0.15 0.19 0.29 7.74 0.00 0.02 0.02 0.02 0.00 0.00 0.01 0.01 0.01 0.01 0.02 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.01 0.01 0.03 rR ee rP Coef. t-statistic -30.87 -12.88 -2.77 7.93 18.25 21.22 -9.53 6.71 8.86 13.06 11.52 14.01 13.52 -5.33 5.17 11.06 5.12 15.45 12.66 6.76 12.58 7.94 13.36 10.80 19.39 9.58 8.61 13.79 26.99 286.26 iew ev Household size Share of children Share of women Share of working people Non-farm self-employment Wage jobs Minority Head finishing primary school Head finishing secondary school Head finishing high school and above House with private bathroom/kitchen House with shared bathroom or kitchen Semi-permanent house Electricity Radio Flush toilet Double-vault toilet Mobile telephone Refrigerator/freezer Pump Fixed phone Electric fan Television Video cassette Motorbike North East Central Highlands South East Mekong River Delta Constant Fo Table 12 evaluates the accuracy of the quantile regression method. With a cut-off point of 0.25, the quantile regression method identifies 62 percent of the poor and 85 percent of the non-poor correctly, resulting in a total accuracy of 80 percent. The BPAC for the quantile regression method is 46.5, which is substantially lower than those for the poverty probability and OLS models. ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 21 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Table 12: Accuracy of the quantile regression method Cut-off Poverty points accuracy 98.82 96.31 93.01 89.32 85.21 80.80 76.09 71.10 65.80 60.47 54.87 49.17 43.38 37.36 31.36 25.23 19.00 12.69 6.37 81.50 82.57 82.40 81.62 80.17 78.26 75.88 73.07 69.76 66.41 62.64 58.69 54.64 50.20 45.79 41.18 36.42 31.53 26.64 BPAC ee -58.08 -20.96 13.30 46.10 46.47 30.53 13.48 -4.57 -23.74 -43.04 -63.28 -83.93 -104.85 -126.64 -148.35 -170.55 -193.08 -215.93 -238.77 rR 18.83 32.85 44.01 53.74 61.94 69.09 75.13 80.20 84.09 87.89 90.73 93.17 95.34 96.64 98.02 98.92 99.47 99.72 99.96 rP 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 NonTotal Poverty accuracy accuracy Fo To conclude this section, we present a tabular and graphical comparison of the four poverty proxy approaches. Table 13 compares these four approaches at their optimal cut-points. The quantile regression approach has the highest poverty accuracy, while OLS has the highest nonpoverty accuracy. However, judged in terms of total accuracy, the OLS approach has the best result, followed by the probit model 1. If BPAC, which is our preferred measure is used, the Probit Model 1, Probit Model 2 and OLS produce similar results, while those for PCA and quantile regression approaches are substantially lower. The PCA approach has both the lowest total accuracy and BPAC. Table 13: Comparing the accuracy of the four approaches BPAC Cut-off Poverty Non-Poverty Total accuracy points accuracy accuracy 0.35 59.15 86.81 80.82 52.29 0.35 53.11 87.07 79.71 53.02 57.95 54.56 61.94 87.64 83.16 85.21 81.49 76.96 80.17 52.63 39.06 46.47 ly 0.35 0.25 0.25 On Probit: Model 1 (enlarge) Probit: Model 2 (parsimonious) OLS PCA Quantile regression iew ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 23 of 37 22 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 24 of 37 Figure 3 summarizes the ROC areas under four approaches, using 20 cut-off points for each model described above. The probit Model 1, OLS regression and the quantile regression have very similar ROC areas, and their ROC curves are visually (and statistically) indistinguishable from one another. This confirms these three models’ performance using the BPAC. In contrast, probit Model 2 and the PCA method have lower ROC curves and areas, with PCA having the lowest area under the ROC curve. This confirms the PCA methods poor performance according to the BPAC. Finally, we report the poverty headcount ratios, as calculated by four models at the optimal points. Poverty rates are defined as the percentage of households who are considered poor at the optimal cut-off points to the total population. The standard errors of the poverty rates are calculated based on bootstrapping with 200 replications. The results are presented in Table 14. Table 14 shows that Model 1 slightly overestimates the true poverty rate while the other models underestimate it. The 95% confidence intervals show that the Probit Model 1 and OLS estimates of the poverty headcount ratio are not statistically different from the “true” poverty headcount ratio estimated directly from the VHLSS06. Fo iew ev rR ee rP ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 23 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Table 14: Poverty headcount ratios and standard errors the four approaches Poverty Bootstrapped 95% confidence headcount standard errors interval Ratio Probit: Model 1 23.14 0.50 22.28 24.00 Probit: Model 2 21.63 0.41 20.85 22.31 OLS 21.80 0.50 20.88 22.72 PCA 20.00 0.27 22.14 23.10 20.00 0.28 19.45 20.55 Quantile regression "True" poverty headcount ratio 22.36 From this analysis, we choose the probit method with Model 1 as our preferred model, as it performs well in terms of Total Accuracy, the BPAC, the area under the ROC curve and in predicting the poverty headcount. In the next section, we will validate this model by testing its robustness to different poverty lines and an alternative household datasets. Fo rP Coverage of Poor (Sensitivity) 0.25 0.50 0.75 1.00 Figure 3: Areas under the ROC curve for the four approaches 0.00 0.00 iew ev rR ee 0.25 0.50 0.75 Inclusion of Non-Poor (1-Specificity) Probit Model 1: 0.8353 OLS: 0.8355 Quantile Regression: 0.8346 Probit Model 2: 0.8047 PCA: 0.7781 Reference ly 5. Validating the Poverty Probability Method 1.00 On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 25 of 37 To validate the use of the poverty probability method, we conduct three exercises: using two different poverty lines with the same dataset (VHLSS06), and using an alternative household 24 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 26 of 37 datasets (the VHLSS04) to test its robustness. As Chen and Schreiner (2009) and others have pointed-out, it is important to know about the out-of-sample predictive power of an approach since an approach which identifies the poor very accurately with one dataset may perform poorly when applied to different data. 5.1. Validating using a moderate poverty line We tested our preferred model (Model 1, probit) with the higher international income poverty line of $2 per capita per day, which is used to identify the moderately poor (Chen and Ravallion, 2008). . The results in Table 15 show that the model is rather good at predicting both extreme and moderate poverty. At the cut-off point of 0.50, the model correctly identifies 75.6 percent of the poor and 73.2 percent of the non-poor. Overall, the poverty status of 74.4 percent of all households is correctly identified, while the BPAC is relatively high at 72.4. Table 15: Accuracy of Poverty Probability Method with $2/day Poverty Line Total accuracy 55.36 59.00 62.10 64.78 67.50 69.62 71.61 72.96 74.05 74.35 74.09 73.28 71.91 69.85 67.30 64.46 60.98 57.61 54.17 BPAC rR 9.95 18.25 25.63 32.65 40.08 46.75 53.76 60.02 66.47 72.42 61.26 42.89 23.46 3.85 -16.01 -34.18 -53.20 -69.64 -85.38 iew ev Non-poverty accuracy 12.31 20.38 27.58 34.41 41.65 48.15 54.97 61.07 67.35 73.14 78.48 83.38 87.88 91.53 94.64 96.79 98.39 99.28 99.86 ee Poverty accuracy 99.56 98.66 97.54 95.98 94.04 91.68 88.69 85.17 80.93 75.60 69.58 62.91 55.51 47.58 39.24 31.26 22.57 14.81 7.24 rP Cut-off points 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 Fo ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 25 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 27 of 37 Oxford Development Studies 1.2. Validating using a consumption-based poverty line The next step is using a different definition of poverty based on consumption expenditure. We use the ‘official’ poverty line of the General Statistics Office, which the per capita expenditure needed to obtain 2,100 Kcal per person per day plus a modest allowance for non-food expenditures. Table 16 shows the results. At the optimal cut-off point of 0.40, the model can correctly specify the expenditure-based poverty status of 86.5 percent of all households, including 65.2 percent of the poor and 91.7 percent of the non-poor. Comparing Table 16 (poverty based on consumption) with Table 5 (poverty based on income), it appears that household asset and socio-economic status are more closely related to consumption than to income. Table 16: Accuracy of Poverty Probability Method using expenditure-based poverty line Total accuracy 63.96 71.93 77.07 80.54 82.92 84.04 85.39 86.50 86.82 87.27 87.24 86.92 86.53 85.76 85.32 84.75 83.76 82.69 81.81 -80.74 -37.16 -6.40 16.38 33.28 44.86 56.36 64.17 45.38 28.06 13.30 -1.83 -15.51 -29.35 -41.02 -49.97 -61.83 -73.83 -83.85 rR iew 2. Validating using the VHLSS 2004 BPAC ev Non-poverty accuracy 55.71 66.39 73.93 79.51 83.65 86.49 89.31 91.72 93.53 95.31 96.49 97.46 98.26 98.75 99.33 99.59 99.74 99.83 99.92 ee Poverty accuracy 97.60 94.55 89.88 84.78 79.92 74.05 69.39 65.19 59.48 54.46 49.50 43.90 38.69 32.77 28.13 24.18 18.55 12.73 7.92 rP Cut-off points 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 Fo ¶ 3 In the final step of validation, we test the poverty probability model using data for rural areas from the Vietnam Household Living Standards Survey (VHLSS) of 2004, a comparable nationally representative household survey conducted in 2004. The VHLSS 2004’s sample size includes 46,000 households (of which expenditure data were collected for 9,300 households). We used the coefficients obtained from estimating the Probit Model 1 using the VHLSS 2006 and “exported” these to the VHLSS 2004, where the same set of variables were available. ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 The results from our validation exercise are presented in Table 18. At the cut-off point of 0.25, 79.2 percent of all households are correctly specified according to their income poverty status (at 26 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 28 of 37 $1.25 per head), including 52.8 percent of the poor and 86.9 percent of the non-poor. The BPAC is 50.4We also test the model with the moderate international poverty line of $2 per capita in Table 19. The results show that the model performs well. At the cut-off point of 0.4, 70.9 percent of all households are correctly classified, including 75.5 percent of the poor and 65.8 percent of the non-poor. The BPAC is high at 69.3. Table 18: Accuracy of Poverty Probability Method using VHLSS 2004 and $1.25/day line Cut-off Poverty Non-poor Total BPAC points accuracy accuracy accuracy 0.05 91.32 43.31 54.17 -93.87 0.10 81.48 61.41 65.95 -31.97 0.15 71.86 72.88 72.65 7.27 0.20 61.71 81.55 77.06 36.92 0.25 52.79 86.89 79.18 50.40 0.30 43.86 90.91 80.27 18.80 0.35 37.25 93.90 81.08 -4.66 0.40 30.38 95.55 80.81 -24.02 0.45 23.86 97.01 80.46 -42.07 0.50 18.24 98.08 80.01 -56.94 0.55 14.41 98.78 79.69 -67.00 0.60 10.70 99.38 79.31 -76.47 0.65 7.32 99.75 78.84 -84.51 0.70 5.05 99.86 78.41 -89.43 0.75 2.72 99.91 77.92 -94.24 0.80 1.26 99.92 77.60 -97.21 0.85 0.60 100.00 77.51 -98.79 0.90 0.42 100.00 77.47 -99.16 0.95 . . . . Fo rR ee rP Table 19: Accuracy of Poverty Probability Method using VHLSS 2004 and $2/day line Cut-off points 56.00 59.82 63.08 65.76 67.98 69.99 70.85 70.89 71.45 70.26 68.50 66.54 16.89 25.37 33.60 41.37 48.94 56.75 63.58 69.27 64.00 45.17 25.35 5.28 ly 7.38 16.83 25.99 34.66 43.10 51.80 59.41 65.75 73.13 78.45 83.33 88.02 On 99.62 98.40 96.36 93.67 90.31 86.32 81.10 75.50 69.94 62.92 55.20 47.27 iew 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 NonTotal BPAC Poverty accuracy accuracy poor accuracy ev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 27 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] 4 3 . ¶ Oxford Development Studies 0.65 0.70 0.75 0.80 0.85 0.90 0.95 40.01 32.55 24.70 18.00 11.71 6.61 2.45 91.65 94.46 96.24 97.61 98.88 99.73 100.00 64.43 61.83 58.53 55.65 52.94 50.65 48.58 -12.50 -29.93 -47.23 -61.86 -75.59 -86.53 -95.10 VI. Conclusions Recognising the difficulties involved collecting comprehensive household expenditure and income data for sub-populations of interest, this paper has explored four ‘short-cut’ methods for predicting a household’s monetary poverty status using data from rural Vietnam. These are the poverty probability method (probit model), OLS and quantile regressions and asset indices constructed using principal components analysis. As shown in Table 11 and Figure 3 above, the poverty probability method is found to be the most accurate method for predicting poverty using a nationally representative survey for 2006. The poverty probability method allows around fourfifths of poor and non-poor to be accurately identified using this data. Fo rP We then verified our preferred method using different poverty lines and data from a previous national survey (in 2004). The poverty probability model performs robustly across alternative poverty lines and data sets, accurately identifying between 74% and 87% of the poor and nonpoor. ee Furthermore, our empirical results show that variables with the strongest correlation to poverty are household size and household composition, minority variable, head education, housing type and ownership of radio, mobile telephone, refrigerator, television and motorbike. A checklist for collecting these variables from households is provided in Appendix A2, while a set of Excel spreadsheets for implementing the poverty probability method’s calculations are available by writing to the corresponding author. While further testing of this method is definitely required, initial field testing in Hoa Binh and Ha Giang provinces indicates that it is possible to collect the checklist information in a 10 to 15 minute interview with each household. Further research is, however, needed to establish the recommended minimum sample size and sampling protocols to use when applying the method. Initial simulations produced by bootstrapping the VHLSS06 indicate that sample sizes of around 200 households are needed to measure the poverty headcount with a 10 percent margin of error (see Appendix A.4) iew ev rR Several caveats regarding the use of the poverty probability method should be noted. First, the method’s focus on identifying monetary poverty in rural areas deserves reiterating. While it would be challenging to extend this method to non-monetary poverty measures, it would be relatively simple to extend it to urban areas or, indeed, other countries—though some additional variables (e.g., ownership or air conditioners or motor cars in urban Vietnam) would be needed and different coefficients would need to be estimated. Second, while the method has high total accuracy, it is only able to correctly identify 78 to 81 percent of the poor and non-poor. If it used as to identify whether individual households are poor or non-poor, errors of targeting (both under-coverage of the poor and inclusion of the non-poor) are bound to occur. When used on larger samples, the full model tends to slightly overestimate the true poverty rate, while the more parsimonious model tend to underestimate it. . Third, the poverty probability method is unlikely to be a good way of detecting changes in poverty over periods of a few years. Careful attention ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 29 of 37 28 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 30 of 37 should be given to the standard errors of the poverty rates produced, which as mentioned above are quite wide. It would also be useful to investigate how estimated coefficients of the underlying model change over time, which is possible in Vietnam due to the biennual frequency of its national household surveys. Finally, further field testing of the poverty proxy checklist and the Excel worksheets which accompany it are needed before the method can be firmly recommended for ex ante and ex post poverty impact work. Fo iew ev rR ee rP ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 29 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies References Alkire, S. and M.E. Santos (2010) Acute multidimensional poverty: a new index for developing countries, Human Development Research Paper 2010/11, New York: United Nations Development Program. Baulch, B. (2002) Poverty monitoring and targeting using ROC curves: Examples from Vietnam, IDS Working Paper No. 161, http://www.ids.ac.uk/ids/bookshop/wp/wp161.pdf. Chen, S. and M. Ravallion (2008) The developing world is poorer than we thought, but no less successful in the fight against poverty, Policy Research Working Paper Series 4703, World Bank, Washington, DC. Chen, S. and M. Schreiner (2009) A simple poverty scorecard for Vietnam, Progress Out of Poverty, Grameen Foundation. http://www.microfinance.com/#Vietnam. Chowdhuri. R. and Baulch, B. (2010) Should PI use an asset based approach for its poverty analysis?, Mimeo, Prosperity Initiative, Hanoi Fo Filmer, D. and L. Pritchett (2001) Estimating wealth effects without income or expenditure data -- or tears: an application to educational enrollments in states of India, Demography 38(1), pp. 115-132 Gwatkin, D., S. Rutstein, K. Johnson, E. Suliman, A. Wagstaff and A. Amouzou. (2007) Socioeconomic differences in health, nutrition, and population: Vietnam, Country Reports on HNP and Poverty, Washington, D.C.; World Bank, http://siteresources.worldbank.org/INTPAH/Resources/400378-178119743396/vietnam.pdf. rP IRIS Center (2007) Client assessment survey—Vietnam, online at http://www.povertytools.org/USAID_documents/Tools/Current_Tools/USAID_PAT_VIET_72007.xls. ee IRIS Center (2008) Accuracy results for 20 poverty assessment tool countries, online at http://www.povertytools.org/other_documents/PAT_20_country_accuracy_results_Dec2008.pdf. rR Kolenikov, S. and G. Angeles (2008), Socioeconomic status measurement with discrete proxy variables: is principal components analysis a reliable answer?, Review of Income and Wealth, 55(1), pp. 128-165. ev Nguyen, Linh (2007) Identifying poverty predictors using household living standards surveys in Viet Nam, in G. Sugiyarto (ed.) Poverty Impact Analysis Selected Tools and Applications, Asian Development Bank, Manila, Philippines. iew Ravallion, M., S. Chen and P. Sangraula (2008) Dollar a day revisited, Policy Research Working Paper Series 4620, World Bank., Washington, DC. Rustein, S. and Johnson, K. (2004) The DHS Wealth Index, DHS Comparative Reports 6, Calverton: ORC Macro Sahn, D. and D. Stifel. (2003) Exploring alternative measures of welfare in the absence of expenditure data, Review of Income and Wealth, 49(4), pp. 463–489. On Wodon, Q. (1997) Targeting the poor using ROC curves, World Development, 25(12), pp. 20832092. ly 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 31 of 37 30 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 32 of 37 Appendices A1. Comparison of poverty/asset indicators used by different studies in Vietnam IRIS Household characteristics Composition Household size Number of children Number of women % of dependents % of working age members % of working in agriculture Head Head’s age Head’s marital status Head ethnicity Education Head's education Spouse’s education Number of adults with no education Occupation Agriculture activities Wage activities Non-farm activities Crop activities Agricultural services Accommodation and land Type of house Type of roof Type of toilet Type of floor Source of lighting Main cooking fuel Source of drinking water Living area Number of rooms occupied Number of people per bedroom Land area Land rented out Sahn & Stifel Baulch Gwatkin et al. Chen & Schreiner √ This paper Linh N. √ √ √ √ √ √ √ √ √ Fo √ √ rP √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ On √ √ √ iew √ √ √ √ √ ev √ √ √ rR √ √ √ √ ee √ √ ly 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies 31 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Assets and durables goods Television Refrigerator Motorcycle and/or car Radio Cookers (or stoves) Bicycle Motor scooter Boat Washing machine Video cassette Fixed telephone Mobile telephone Ploughing machines Sewing machine Wardrobe Mill Garden Electric fan Pump # of chickens owned Geographic Region √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Fo √ √ √ rP √ √ √ √ √ iew ev rR ee ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 33 of 37 32 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Page 34 of 37 Appendix A2: A Poverty Proxy Checklist for Rural Vietnam (Expanded Module) Household ID: Date of interview: Household head's name: Village: _ _ / _ _ / _ _ _ _ Length of Interview: Interviewer's name: Commune: District: 1 2 3 4 iew ev ly On 10 11 12 13 14 15 16 17 18 Please write 1 if the answer is YES, 0 if the answer is NO Does the household’s head belong to an ethnic minority (not Kinh or Hoa)? What is the highest education level completed by the household's head A. Less than primary B. Primary C. Secondary D. High school or above What type is the household's main residence? A. Villa or private house B. House with a shared kitchen or bathroom/toilet C. Semi-permanent house D. Makeshift or other Is electricity used as the main lighting in the household? What type of toilet arrangement does the household have? A. Flush toilet or sulabh toilet * B. Double vault compost latrine or toilet directly over the water C. No toilet or others Does the household have a radio or radio cassette player? Does the household have a motorbike? Does the household have a fixed telephone? Does the household have a mobile telephone? Does the household have a television? Does the household have a refrigerator/freezer? Does the household have a video cassette? Does the household have an electric fan? Does the household have a pump? rR 8 9 Please put numbers to answers How many people are there living in your household? How many household members… are 14 years old or younger? are from 15 to 59 year years old? How many household members are female? In the past 12 months, how many household members work for wages/salaries are self-employed ee 7 Province: rP 5 6 minutes Fo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Oxford Development Studies *Note: Sulabh toilets (h xí th m d i nư c) are latrines with open bottoms, which disintegrate stools by water pouring and absorbing. 33 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Appendix A3: A Poverty Proxy Checklist for Rural Vietnam (Concise Module) Household ID: Date of interview: Household head's name: Village: _ _ / _ _ / _ _ _ _ Length of Interview: Interviewer's name: Commune: District: 1 2 3 4 5 Province: Please put numbers to answers How many people are there living in your household? How many household members are 14 years old or younger? Please write 1 if the answer is YES, 0 if the answer is NO Does the household’s head belong to an ethnic minority (not Kinh or Hoa)? Does the household's head have high school degree or above? What type is the household's main residence? A. Villa or private house B. House with a shared kitchen or bathroom/toilet C. Semi-permanent house D. Makeshift or other Does the household have a flush toilet or sulabh toilet? * Does the household have a motorbike? Does the household have a mobile telephone? Does the household have a television? Does the household have an electric fan? rR ee rP 6 7 8 9 10 minutes Fo *Note: Sulabh toilets (h xí th m d i nư c) are latrines with open bottoms, which disintegrate stools by water pouring and absorbing. iew ev ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 35 of 37 34 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Page 36 of 37 A4. Sample Size Simulations A question arising in the poverty proxy checklist method is a suitable sample size to estimate poverty. To check this, we implemented a bootstrapping simulation based on a subset of VHLSS 2006, which include two provinces in North-Western Vietnam which are of particular interest to PI: Thanh Hoa and Hoa Binh. This subset of the VHLSS06 includes 1620 households In the simulation, we drew n number of households from the data, and estimated poverty rate based on the subsamples, with 500 replications for each approach. We use the standard error ratio, that is the standard error of the poverty rate estimated by each of the four approach expressed as a percentage of “true” poverty rate, to determine the extend of error The results in Table A4.1 show that if we draw less than 12% of the sample (200 households), the standard error ratio as percentage of true poverty rate is about 10.2%. If we want to achieve less than 5% standard error ratio, the sample size must be above 50% of the whole sample. Fo Table A4.1: Comparing sensitivity of poverty estimates to sample sizes by different approach Standard Error Ratio (%) Quantile Probit 1 OLS PCA regression 52.19 47.97 54.26 47.05 43.12 43.62 50.59 41.90 32.34 34.69 42.52 30.81 23.28 25.77 30.3 21.68 19.56 21.48 23.27 18.14 16.51 19.95 21.06 15.55 15.08 16.69 19.04 14.12 12.07 13.06 16.07 11.21 10.19 11.19 13.7 9.42 9.28 10.09 12.46 8.48 8.54 9.17 10.99 7.76 7.43 7.76 9.78 6.65 6.62 6.92 8.5 5.95 5.39 5.58 7.34 4.76 4.57 4.87 6.36 4.05 3.6 3.91 5.23 3.27 iew ev rR ee Sample Size (households) 5 10 20 40 60 80 100 150 200 250 300 400 500 750 1000 1500 rP As shown in Table A4.1 below, the standard error ratio for each of the four poverty proxy approaches falls dramatically until sample sizes of around 60 households are reached. Thereafter, although the standard error ratio continues to decline it does so at a declining rate. ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 35 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected] Oxford Development Studies Figure A4.1: Comparing sensitivity to sample sizes by approach Fo iew ev rR ee rP ly On 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 37 of 37 36 URL: http:/mc.manuscriptcentral.com/cods Email: [email protected]