Oxford Development Studies
rR
ee
rP
Fo
!
#
$
%
&
&'
(
w
ie
ev
"
ly
On
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 2 of 37
ASSESSING ALTERNATIVE POVERTY PROXY
METHODS IN RURAL VIETNAM
20 December 2010
Abstract
Fo
This paper compares and contrasts the use of four ‘short-cut’ methods for identifying poor
households: (i) the poverty probability method; (ii) OLS regressions; (iii) principal components
analysis; and, (iv) quantile regressions. After evaluating these four methods using two alternative
criteria (total and balanced poverty accuracy) and representative household survey data from
rural Vietnam, we conclude that the poverty probability method─which can correctly identify
around four-fifths of poor and non-poor households─ is the most accurate ‘short-cut’ method for
measuring poverty for specific sub-populations, or in years when household surveys are not
available. We then test the performance of the poverty probability method with different poverty
lines and using an alternative household survey, and find it to be robust.
iew
ev
rR
ee
rP
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
1
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
I. Introduction
In most developing countries, it is only feasible to conduct detailed household surveys every few
using relatively small samples of households. The results of these surveys can usually only be
disaggregated to the regional or provincial level, and cannot be disaggregated for many
population groups that are of interest to policy makers (for example, specific occupations or
ethnic groupings). However, government and donor agencies often require that poverty should be
monitored on an annual basis for specific administrative or project areas, or require projects
demonstrate their impact on specific groups or occupations. Poverty measurement using
household surveys is also difficult, expensive and time consuming, requiring detailed information
is collected on all the different components of household expenditures and/or incomes.
Short-cut methods for measuring monetary poverty in specific areas or sub-populations have
therefore been devised for around 30 developing countries, most noticeable by the Grameen
Foundation and USAID Poverty Assessment Tools project.1 Typically these methods use 10 to
20 easily verifiable indicators to obtain an index or score that is highly correlated with household
poverty status. Using these short-cut methods, non-specialists can collect data for each household
in the field in ten to fifteen minutes which, when combined with the coefficients from models
estimated with nationally representative household survey data,can provide a reasonably accurate
prediction of household’s poverty status. However, there have been few attempts to
systematically compare such methods (especially using out-of-sample predictions).
Fo
rP
This paper compares and contrasts the use of four ‘short-cut’ methods for measuring monetary
poverty in rural Vietnam. These three methods, which we shall hereafter describe collectively as
poverty proxy methods, are: (i) the poverty probability method; (ii) OLS regressions; (iii)
principal components analysis and (iv) quantile regression. Each of these poverty proxy methods
have been used in the past in Vietnam using different datasets and poverty lines (see Section II),
but to date there has been no study which compares the accuracy of these different methods using
the same data set, and few which have compare their out-of-sample predictive power using
different data sets. Accordingly, this study uses the 2006 Vietnamese Household Living
Standards Survey (VHLSS 2006) to test these four methods for rural households using a common
international poverty line ($1.25/day in 2005 PPP terms). After evaluating these four methods
using two alternative criteria (total and balanced poverty accuracy, which are explained below),
we also test the models’ performance with different poverty lines and its out-of-sample
performance using an alternative household survey (the VHLSS of 2004). We conclude that the
poverty probability method is the most accurate ‘short-cut’ method for measuring poverty for
specific sub-populations of interest, or in years when representative household surveys are not
available.
II. Literature Review
iew
ev
rR
ee
This section provides a brief overview of six previous applications of poverty proxy methods in
Vietnam in approximate chronological order.2 While two of these studies have been developed
independently by Vietnam-based researchers, the remaining four are part of larger cross-country
efforts to development ‘short-cut’ poverty assessment for various development organisations.
1
2
See www.microfinance.org/#Poverty_Scoring and www.povertytools.org
This section draws on Chen and Schreiner (2009).
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 3 of 37
2
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 4 of 37
2.1. Baulch (2002)
In the earliest known application of poverty proxy methods in Vietnam, Baulch (2002)
constructed two composite poverty indices using the national poverty line of 4,904
Dong/person/day and the Vietnam Living Standards Survey (VLSS) 1997-98. Baulch used a
combination of Receiver Operating Characteristic (ROC) curve technique to assess and stepwise
probits to build his poverty indices, which contains six indicators for urban areas and twelve
indicators for rural area. He assesses the poverty accuracy of this method but did not validate his
results using a different dataset.
2.2. Sahn and Stifel (2003)
As part of a larger cross-country study involving LSMS type data from ten developing countries,
Sahn and Stifel (2003) used factor analysis and the 1992/3 and 1997/8 VLSS to construct an
“asset index” for Vietnam. The indicators used include ownership of consumer durables,
residence quality and education of the household head. Sahn and Stifel (2003) did not test their
asset index on other datasets. Moreover, their study did not indicate its poverty accuracy, i.e. its
accuracy in correctly identifying and targeting the poor.
Fo
2.3. Gwatkin et al. (2007)
rP
Gwatkin et al. (2007) used principal components analysis to create a “wealth index” for the 7,048
households in the 2002 Vietnam Demographic and Health Survey. This was part of a wider
World Bank sponsored project to produce wealth indices for 56 developing and transition
economies. In all these study, poverty is defined in relative, rather than absolute terms. Gwatkin
et al. construct a “wealth index” for Vietnam using 18 indicators. Principal components analysis
(PCA) is used to generate a weight for each household item with available information. The
wealth index score is then calculated for each household by weighting the response with respect
to each item pertaining to that household by the coefficient of the first principal component and
summing the results. Their wealth index is standardized in relation to a standard normal
distribution with a mean of zero and a standard deviation of one.
rR
ee
While powerful and relatively easy to calculate, it is difficult to use the wealth index to estimate
poverty rates at the household or individual level. Furthermore, its accuracy was not tested in
Gwatkin et al. (2007) and they also did not validate their wealth index using a different dataset.
iew
2.4. IRIS Center (2007)
ev
USAID commissioned the IRIS Center at the University of Maryland (IRIS 2007) to build a
poverty scorecard for Vietnam along with 28 other developing countries as part of its Poverty
Assessment Tools project (www.povertytools.org). IRIS (2007) considers only USAID’s
“extreme” poverty line (equivalent to VND 3,818 /person/day in January 1999 prices) and used
VLSS 1997/8 data for its analysis. IRIS use 17 indicators including household size, household
head’s age, ownership of motorcycle etc. From these variables, IRIS calculated poverty scores
using four different methods: OLS, quantile regression, linear probability and probit and use the
“Balanced Poverty Accuracy Criterion” (BPAC), which USAID have since adopted and is
explained below, to evaluate these methods. After comparing these four models, IRIS
recommend the use of quantile regressions for determining the poverty status of households in
Vietnam. Using the USAID “extreme” line and the 1997/8 VLSS, the IRIS method produces a
BPAC is 61.7. The IRIS Center also did not validate their results using a different dataset.
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
3
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
2.5. Linh Nguyen (2007)
In a paper for the Asian Development Bank, Linh Nguyen (2007) uses multiple regression
techniques to assess poverty using the VHLSS 2002 data. This technique detects variables or
predictors that are correlated with a household’s consumption expenditure and consequently, its
poverty status. She used bivariate and multivariate analysis to narrow down the number of
variables from an initial list of 60 variables to 22 indicators in rural and 15 indicators in urban
areas. Linh Nguyen (2007) validated her results using the VLSS 1998 data and a subset of the
VHLSS 2002 (for Thanh Hoa and Nghe An provinces).
2.6 Chen and Schreiner (2009)
Schreiner and colleagues have developed poverty scorecards for the Grameen Foundation in 28
developing countries (www.microfinance.com/#Poverty_Scoring). Chen and Schreiner (2009)
develop a simple poverty “scorecard” for Vietnam with 10 indicators selected from an initial list
of 150 indicators drawn from the VHLSS 2006. Each indicator is first screened with an entropybased “uncertainty coefficient” that measures how well each indicator predicts poverty on its
own. Their final indicator selection uses both judgement and statistics (a forward stepwise logit).
The final scorecard is built using a PPP $1.75/day poverty line and a logit regression.3 One
advantage of Chen and Schreiner (2009) method is their validation of the scorecard using the
VHLSS 2004. However, its performance is not compared to those of other methods.
Fo
rP
Appendix A1 summarises and compares the different indicators that were used to predict poverty
in each of these studies, and compares them with those proposed in this paper.
ee
It should be noted that four of the six of these poverty proxy methods have an explicit focus on
monetary poverty (identified according to whether a household’s per capita expenditure is above
or below a pre-determined absolute poverty line) while the other two methods concern asset
poverty. None of the methods consider the wider non-monetary dimensions of poverty that are
considered in, for example, the UNDP’s Multidimensional Poverty Index (Alkire and Santos,
2010). While focusing on monetary poverty is obviously restrictive, it does reflect the principal
way in which poverty is measured in Vietnam (and many other countries).
III. Data and Methods
ev
rR
We used data from the VHLSS 2006, the most recent available national income and expenditure
survey in Vietnam. The data cover over 45,000 households in rural and urban areas. It includes
information on household income, assets, expenditure4 and other socio-economic dimensions.
Using the VHLSS06 data, we compare the results of four poverty proxy approaches. In addition,
we used the VHLSS 2004 and the Thanh Hoa Resurvey data for validation of estimates of
poverty rates.
iew
There are two “official” poverty lines in Vietnam. The General Statistical Office (GSO) defines a
food poverty line based on the expenditure required to obtain 2100 calories per person per day.
Based on the food poverty line, the national poverty lines are then defined as the food poverty
lines plus non-food expenditure by a reference group with food expenditure close to the food
poverty line. The GSO’s poverty line is equivalent to VND 7,011/person/day at January, 2006
prices. The GSO’s poverty line is, however, based on a food basket which was first estimated in
3
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 5 of 37
Chen and Schreiner justify the use of a PPP $ 1.75/day poverty line by saying that it is close to the national poverty
line.
4
The expenditure data are collected from a subsample of just over 9,000 households.
4
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 6 of 37
1993, and has only been updated by inflating its food and on-food components by the relevant
price indices.
An alternate set of poverty lines are set by Ministry of Labour, Invalids, and Social Affairs
(MOLISA) for 2006–2010 as VND 6,575/person/day for rural areas and VND 8,548/person/day
for urban areas (Chen and Schreiner 2009). The MOLISA poverty lines are administratively
determined and updated periodically to reflect changes in both the cost of living and living
standards. In contrast to the General Statistics Office, MOLISA’s poverty lines are based on per
capita incomes. At the present time, there is an ongoing debate about the updating of the
MOLISA poverty lines for the 2011 to 2015 period.
Because of the dated nature of both of the GSO and MOLISA poverty lines, the poverty lines
used in our analysis are the international poverty lines of PPP $1.25 and $2.00 per person per
day. These lines were calculated by the World Bank using household survey data from 116
countries together with the results of the 2005 International Comparisons Project (Ravallion et
al., 2008). In Vietnamese Dong, the $1.25/day line is equivalent to VND 242,250/person/month
while the $2/day line is VND 387,600/person/month, in January, 2006 prices. These are the
poverty lines which most international and bilateral donors use for monitoring the MDGs. Those
with incomes (or expenditures) of less than PPP $1.25/day are usually regarded as extremely
poor, while those living between PPP $1.25 and $2/day as moderately poor.
Fo
We use two criteria to assess the accuracy in predicting poverty. The first criterion is total
accuracy, i.e. weighted average of poverty accuracy and non-poverty accuracy. It is calculated by
the following formula:
rP
Total accuracy= Headcount index × Poverty accuracy+ (1- Headcount index) × Non-poor
accuracy.
(1)
ee
Thus total accuracy, which will always vary between 0 and 100, shows the percentage of people
correctly identified as poor and non-poor.
The second criterion is BPAC index, adopted by USDA in its poverty assessment. The BPAC
index is calculated by the following formula
rR
BPAC= (Inclusion – |Under-coverage – Leakage|) x [100 ÷ (Inclusion + Under-coverage)]
(2)
in which, Under-coverage= the “true” poor incorrectly predicted as non-poor, expressed as a
percentage of total “true” poor; Leakage = “true” non-poor incorrectly predicted as poor,
expressed as a percentage of total “true” poor; Inclusion= the “true” poor correctly predicted as
poor, expressed as a percentage of total “true” poor. In other words, BPAC is the poverty
accuracy minus the difference between under-coverage and leakage expressed as percentages of
total “true” poor. Note that unlike, Total Accuracy, BPAC can take negative values when the
absolute difference between under-coverage and leakage exceeds poverty accuracy.
iew
ev
In line with Prosperity Initiative’s goal of reducing poverty at scale (that is having systemic
impacts on poverty reduction that extend beyond the communities in which the organisation is
working) our preferred criterion is the BPAC. As Total Accuracy combines accurate
identification of both poor and non-poor, this measure is only useful if one is interested in an
aggregate assessment of poverty status without wanting to target the poor specifically. Indeed, in
some cases, a proxy method with high Total Accuracy can give a highly inaccurate identification
of poor people. For example in Table 5, at the cut-off point of 0.5, Total Accuracy is the highest
(82.74) but only 38.1 percent of the poor are correctly identified. So for this reason, we focus on
the BPAC in assessing different poverty proxy models.
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
We also employ Receiving Operating Characteristic (ROC) curves to show the accuracy of
different poverty proxy methods. ROC curves are diagrams which portray the ability of different
5
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
non-parametric
Page 7 of 37
Oxford Development Studies
diagnostics tests to distinguish between a binary outcome, and were originally developed for use
in electrical engineering and signal processing (Baulch, 2002; Wodon, 1997). A ROC curve
shows the ability of a test to distinguish between two states or conditions. In poverty analysis,
ROC curves plot the probability of a test correctly identifying a poor person as poor (which is
called the test’s “sensitivity”) on the vertical axis against one minus the probability of the same
test correctly classifying a non-poor person as non-poor on the horizontal axis (which is called
the test’s “specificity”). Typically, ROC curves are concave and embody a trade-off between
coverage of the poor and inclusion of the non-poor (see Figures 1 to 3 below). As long as an
indicator or index increases in value as the likelihood of poverty increases, then the area under an
ROC curve—which will always vary between zero and one—can be used for ranking their
relative efficacy as poverty proxies. An ROC curve with an area of 0.5 will lie mostly below the
leading diagonal.
IV. Constructing poverty proxies for rural Vietnam
Fo
1. Poverty indicators
In order to assess poverty, we use three alternative poverty proxy methods: the poverty
probability (probit), OLS regression, and principal component analysis (PCA). As shown in
Section 2, these are the three most commonly used methods in poverty proxy studies in Vietnam
(as well as other developing countries). After comparing the accuracy of these methods in
identifying the poor and non-poor in rural Vietnam, we then select our preferred model.
rP
At the first step, we collect 48 potential poverty indicators at household level5 in the following
categories:
ee
-
Household characteristics (such as household size, share of female members, share of
children)
-
Education indicators (such as household head’s education level, spouse’s education
level).
-
Housing indicators (such as type of the main residence, type of toilet).
-
Asset indicators (ownership of durable goods such as motorcycle, bicycle, radio).
-
Agriculture and land variables (such as whether the household grows crops, annual crop
areas, total area, irrigated area).
ev
rR
The list of candidate indicators is presented in Table 1, categorized by poverty status (based on
the absolute international poverty line of PPP $1.25).
iew
5
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
We do not use commune or village level information as our aim is to construct a quick-and-easy method for
predicting a household’s poverty status.
6
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 8 of 37
Table 1: Mean values of Candidate Poverty Indicators
Housing
Type
Poor
Non-poor
Living area
Own house
Villa or house with private
bathroom/kitchen
House with shared bathroom or kitchen
Garden
Semi-permanent house
Drinking water from private tab
Flush toilet
Double-vault toilet
Electricity
Daily water from private tab
Daily water from well
Have land for agricultural purpose
Irrigated area
Annual crop area
Household size
Total land area
Head's age
Share of under 15-year old members
Share of female members
Share of members aged 15-59 years
Head is illiterate
Head finishing primary school
Head finishing secondary school
Head finishing high school and above
Spouse finishing primary school
Spouse finishing secondary school
Spouse finishing high school and above
Minority
Crop cultivation
Number of wage earners
Number of household members with
farm jobs
Number of household members with
non-farm self-employment
Continuous
Binary
Binary
50.19
0.97
0
62.41
0.98
0.04
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Continuous
Continuous
Continuous
Continuous
Continuous
Continuous
Continuous
Continuous
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Binary
Integer
Integer
0.06
0.2
0.62
0.03
0.06
0.3
0.87
0.04
0.63
0.92
0.27
0.51
4.77
0.84
48.43
0.30
0.54
0.53
0.02
0.26
0.19
0.04
0.20
0.15
0.02
0.39
0.89
0.78
2.39
0.14
0.26
0.64
0.08
0.27
0.39
0.95
0.08
0.72
0.85
0.46
0.47
4.22
0.89
49.32
0.21
0.51
0.66
0.02
0.27
0.3
0.12
0.24
0.23
0.08
0.13
0.8
0.99
1.9
Fo
iew
0.25
On
Integer
ev
rR
ee
rP
0.55
ly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
7
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 9 of 37
Oxford Development Studies
Ownership of assets and durable goods
Computer
Binary
Radio
Binary
Television
Binary
Video cassette
Binary
Stereo
Binary
Refrigerator/freezer
Binary
Laundry machine
Binary
Electric fan
Binary
Gas cooker
Binary
Rice cooker
Binary
Wardrobe
Binary
Bicycle
Binary
Motorbike
Binary
Fixed telephone
Binary
Mobile telephone
Binary
Pump
Binary
Cattle
Binary
Breeding facilities
Binary
0
0.09
0.6
0.19
0.04
0.01
0
0.61
0.04
0.24
0.51
0.56
0.25
0.02
0.01
0.12
0.54
0.43
Fo
0.03
0.12
0.86
0.44
0.14
0.13
0.03
0.82
0.3
0.59
0.82
0.67
0.52
0.21
0.1
0.29
0.29
0.51
2. Method 1: Poverty probability method
rR
ee
rP
This method uses a probit model to identify the probability of a household being poor. First, a
stepwise probit is run to remove six variables out of the 48 candidate variables that do not predict
poverty well. The remaining 42 variables are then ranked according to their accuracy in
identifying the poor alone using the area under Receiver Operating Characteristics (ROC) curve.
The greater the area under a ROC curve, the better is the indicator in identifying poverty.
ev
Using this list of 42 variables ranked by ROC area, we estimate two models: one is more
expansive and the other more parsimonious. See Appendices A2 and A3 for the poverty proxy
checklists that would be used to apply the two models.
Model 1
iew
From the list of 42 variables, we selected 34 variables based on both our judgment6 and the ROC
area. We then re-ran the probit model taking account of the clustering and stratification in the
VHLSS survey design to calculate coefficient standard errors. This allowed six variables that
have low coefficients in the probit model to be removed. Our final list includes 25 indicators
(excluding regional dummies). These include 11 indicators household (HH) characteristics, five
housing characteristics indicators and nine types of assets.
6
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
For practical purpose, we drop those indicators (such as irrigated land area and crop land area) that would be
difficult to collect information on in a short interview, or which are susceptible to measurement errors.
8
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 10 of 37
Table 2 presents the accuracy of these indicators in identifying the poor in rural Vietnam in terms
of the area under the ROC curve for each variable. Recall that the higher is the area under an
ROC curve, the better the variable underlying it is in distinguishing between the poor and nonpoor. Recall that the maximum value which the area under an ROC curve is 1, and that values
less than 0.5 will generally lie below the leading diagonal. Indicators with areas under the ROC
curve that are significantly greater than 0.5 can be viewed as useful poverty proxies, while areas
substantially less than 0.5 may be regarded as indicators of non-poverty.
Table 2: Accuracy of different indicators in identifying the poor in Vietnam
Indicators
Type
Area under
ROC curve
Household size
HH characteristics
0.605
Share of children
HH characteristics
0.642
Share of working
HH characteristics
0.363
Share of female
HH characteristics
0.536
Head finishing primary school
HH characteristics
0.499
Head finishing secondary school
HH characteristics
0.457
Head finishing high school and above
HH characteristics
0.459
Minority
HH characteristics
0.635
Wage job
HH characteristics
0.453
Non-farm self-employment
HH characteristics
0.401
Semi-permanent house
Housing
0.496
House with private bathroom/kitchen
Housing
0.480
Electricity
Housing
0.463
Flush toilet
Housing
0.391
Double-vault toilet
Housing
0.461
House with shared bathroom or kitchen
Housing
0.458
Radio
Assets
0.484
Mobile telephone
Assets
0.447
Refrigerator/freezer
Assets
0.434
Pump
Assets
0.416
Fixed telephone
Assets
0.401
Electric fan
Assets
0.398
Television
Assets
0.380
Video cassette
Assets
0.372
Motorbike
Assets
0.366
Fo
iew
ev
rR
ee
rP
Note on Indicators:
Share of children: proportion of household members less than 15 years of age.
Minority: 0= all ethnic groups except Kinh and Hoa; 1= Kinh or Hoa
Housing indicators: binary variables indicating if the household has these durables/facilities.
On
The results of the probit model are presented in Table 3. Larger household size, a higher share of
women or children, and a lower share of working members are all associated with higher
probability of poverty. In contrast, households with non-farm wages or non-farm selfemployment have a lower probability of being poor. As expected, households whose heads
belong to one of the ethnic minorities have higher probability of being poor, while the head’s
education level has the opposite effect. Finally, better house type, better toilet type and the
ly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
9
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 11 of 37
Oxford Development Studies
ownership of consumer durables and fixed assets are associated with lower probabilities of being
poor.
Fo
iew
ev
rR
ee
rP
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
10
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 12 of 37
Table 3: Probit model for the composite poverty indicator (model 1)
Std. Err.
0.01
0.06
0.05
0.05
0.02
0.01
0.04
0.03
0.03
0.05
0.05
0.11
0.03
0.06
0.04
0.04
0.03
0.08
0.06
0.03
0.05
0.03
0.03
0.03
0.03
0.04
0.07
0.06
0.04
0.08
rR
ee
rP
t-statistic
21.30
12.85
4.19
-4.92
-12.64
-14.43
7.68
-6.55
-8.96
-9.46
-12.11
-6.14
-10.59
4.85
-3.94
-6.60
-3.61
-6.68
-5.92
-4.87
-7.45
-6.65
-13.51
-8.73
-15.99
-5.43
-4.81
-9.08
-16.93
-3.34
iew
Coef.
0.17
0.74
0.23
-0.24
-0.25
-0.18
0.31
-0.18
-0.27
-0.43
-0.57
-0.68
-0.33
0.29
-0.14
-0.26
-0.10
-0.56
-0.37
-0.15
-0.35
-0.20
-0.35
-0.23
-0.40
-0.24
-0.32
-0.58
-0.75
-0.27
33745
121.74
0
ev
Variables
Household size
Share of children
Share of women
Share of working people
Non-farm self-employment
Wage jobs
Minority
Head finishing primary school
Head finishing secondary school
Head finishing high school and above
House with private bathroom/kitchen
House with shared bathroom or kitchen
Semi-permanent house
Electricity
Radio
Flush toilet
Double-vault toilet
Mobile telephone
Refrigerator/freezer
Pump
Fixed phone
Electric fan
Television
Video cassette
Motorbike
North East
Central Highlands
South East
Mekong River Delta
Constant
Number of obs
F(29, 2201)
Prob > F
Fo
Note: Some regions are removed from model because of the stepwise probit process
On
Figure 1 shows the ROC curve for the composite poverty indicator. As the cut-off used to
distinguish the poor from the non-poor is increased, the proportion of the poor correctly
identified as poor increases, along with the proportion of the non-poor who are incorrectly
identified as poor. Thus the concavity of the ROC curve displays the usual trade-off between
coverage of the poor and inclusion of the non-poor. The area under the ROC curve is 0.8403.
This figure shows that there is a trade-off between coverage of the poor and exclusion of the nonpoor in rural areas. In general, the more accurately a method is in identifying the poor, the less
accurately it will be in identifying the non-poor (and vice versa).
ly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
11
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
0.00
Coverage of Poor (Sensitivity)
0.25
0.50
0.75
1.00
Figure 1: ROC curve for model 1.
0.25
0.50
0.75
Inclusion of Non-Poor (1 - Specificity)
rP
0.00
Fo
1.00
Area under ROC curve = 0.8403
Model 2
ee
In the model 2, we chose a more parsimonious list of 11 household-level indicators based on
several criteria including their ease of collection, their ROC area, and their coefficients and
statistical significance in explaining absolute income poverty. The final list includes 4 household
characteristics (share of children, minority, household size, head finishing high school), 3
accommodation characteristics (house with private bathroom/kitchen, house with shared
bathroom or kitchen, flush toilet) and 4 durable ownership variables (mobile phone, electric fan,
television and motorbike).
iew
ev
rR
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 13 of 37
12
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 14 of 37
Table 4: Probit model for the composite poverty indicator (model 2)
Variables
Share of children
Minority
Household size
Head finishing high school and above
House with private bathroom/kitchen
House with shared bathroom or kitchen
Flush toilet
Mobile phone
Electric fan
Television
Motorbike
North East
Central Highlands
South East
Mekong River Delta
Constant
Number of obs
F(15, 2215)
Prob > F
Fo
Coef.
1.05
0.44
0.10
-0.32
-0.49
-0.36
-0.40
-0.83
-0.25
-0.50
-0.50
-0.20
-0.24
-0.52
-0.62
-0.51
33745
190.26
0
Std. Err.
0.05
0.04
0.01
0.04
0.10
0.04
0.04
0.08
0.03
0.03
0.02
0.04
0.06
0.06
0.04
0.04
t-statistics
21.30
11.06
14.77
-7.94
-4.85
-9.82
-11.19
-10.32
-8.85
-19.15
-20.54
-4.48
-3.74
-8.83
-16.35
-12.04
ee
rP
Figure 2 shows the ROC curve for model 2. The ROC area is 0.8116, less than the ROC area in
Model 1 (0.8403). Thus, Model 1 performs better than Model 2 in terms of ROC areas.
1.00
Figure 2: ROC area for model 2
0.00
Area under ROC curve = 0.8116
1.00
ly
0.25
0.50
0.75
Inclusion of Non- Poor (1 - Specificity)
On
0.00
iew
Coverage of Poor (Sensitivity)
0.25
0.50
0.75
ev
rR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
13
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Table 5 shows the trade-off between correct coverage of the poor and exclusion of the non-poor
in rural areas at different cut-off points. The cut-off points are the predicted probability scores
from the probit models in Table 3 and Table 4. If a very low value for the cut-off (such as 0.05)
is chosen, nearly all the households (97.3%) would be correctly identified as poor in model 1.
However, at this cut-off, only 34.6% of the non-poor would be correctly identified as non-poor in
mode 1. In contrast, if a very high value for the cut-off such as 0.95 is chosen, all non-poor
households would be correctly identified as non-poor but only 1.11 percent of the poor
households would be correctly identified. Thus, the choice of cut-off point would depend on the
relative importance that policy-makers attaches to the two objectives: (a) coverage of the poor
and (b) exclusion of the non-poor.
In Table 5, the optimal cut-off points based on total accuracy (that is the proportion of all
households who are correctly identify as poor or non-poor) are 0.40 for model 1 and 0.45 for
model 2. At the cut-off point of 0.40, 52 percent of the poor and 90 percent of the non-poor are
correctly identified in Model 1 and 45 percent of the poor and 91 percent of the non-poor are
correctly identified in Model 2.
Fo
On the other hand, the optimal cut-off point based on BPAC (which give more weight to accurate
identification of the poor) is 0.35 for both models. At this cut-off point, which is shown in bold in
Table 5, 79.2 percent and 77.7 percent of the people are correctly identified in models 1 and 2,
respectively. In addition, 59.2 percent of the poor and 86.8 percent of the non-poor are correctly
identified in Model 1. For Model 2, 53.1 percent of the poor and 87.1 percent of the non-poor are
correctly identified.
rP
Comparing the two models, it is clear that Model 1 performs better than Model 2 in terms of both
poverty accuracy and total accuracy. Model 1 also performs better than Model 2 at almost all cutoff points in terms of BPAC. However, Model 2 has higher BPAC than Model 1 at the optimal
cut-off point. Yet, Model 2 is more susceptible to the choosing of cut-off point. For example,
moving from the cut-off point of 0.4 to 0.45 reduces BPAC by 60.2 percent in Model 1 and by
77.7 percent in Model 2.
iew
ev
rR
ee
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 15 of 37
14
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 16 of 37
Table 5: Accuracy of the Poverty Probability Method
----------------Model 1----------------------------- ----------------Model 2 ------------------Poverty
NonTotal
BPAC
Poverty
NonTotal
BPAC
accuracy
poverty
accuracy
accuracy
poverty
accuracy
accuracy
accuracy
97.32
34.63
48.20
-136.53
97.54
26.68
42.02
-165.31
92.88
49.72
59.06
-81.93
92.99
43.52
54.23
-104.35
87.56
61.07
66.80
-40.87
85.96
57.30
63.50
-54.51
81.30
70.12
72.54
-8.10
77.28
68.36
70.29
-14.47
73.90
77.07
76.38
17.02
69.29
76.62
75.04
15.41
66.75
82.46
79.06
36.55
59.75
83.20
78.12
39.21
59.15
86.81
80.82
52.29
53.11
87.07
79.71
53.02
52.01
90.28
81.99
39.21
44.71
91.21
81.14
21.23
44.86
92.85
82.46
15.61
40.13
93.23
81.74
4.74
38.06
95.09
82.74
-6.09
32.13
95.70
81.93
-20.18
32.17
96.56
82.61
-23.20
27.55
96.73
81.75
-33.06
27.02
97.69
82.39
-37.61
21.59
97.98
81.44
-49.51
22.06
98.43
81.89
-50.19
16.69
98.60
80.87
-61.56
17.82
98.99
81.42
-60.71
13.43
99.16
80.60
-70.12
13.61
99.39
80.82
-70.58
8.57
99.57
79.87
-81.30
9.70
99.75
80.25
-79.69
6.49
99.76
79.57
-86.17
5.94
99.91
79.56
-87.78
3.23
99.90
78.97
-93.19
3.07
99.98
78.99
-93.80
1.15
99.96
78.56
-97.54
1.11
100.00
78.59
-97.78
0.25
100.00
78.40
-99.51
2. Method 2: OLS regression
rR
ee
rP
Cutoff
point
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
Fo
In this method, a stepwise OLS regression is run based on the list of candidate variables in Table
1. The dependent variable is the natural logarithm of per capita real household income in 2006 in
rural Vietnam. After dropping 10 variables (including living area, total land area, and source of
drinking water) that were not statistically different from zero at the 10% level have insignificant
explanatory power, the results from OLS are presented in Table 6.
iew
ev
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
15
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Table 6: OLS regression of real per capita income 2006
Std. Err.
0.01
0.02
0.02
0.03
0.02
0.01
0.00
0.01
0.01
0.01
0.02
0.03
0.01
t-statistic
-29.03
-5.42
7.92
-6.91
-6.09
13.50
9.80
5.96
7.30
9.79
3.46
5.04
6.52
0.10
0.04
0.16
0.11
0.11
0.10
0.14
0.08
0.07
0.04
0.21
0.17
0.17
0.03
-0.05
0.11
0.17
0.13
0.28
8.15
24815
295.9
0
0.46
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.03
0.02
0.01
0.01
0.02
0.03
0.02
0.02
0.08
7.56
3.42
13.46
10.74
8.51
8.52
16.22
9.33
8.15
3.68
13.98
4.83
11.08
3.64
-5.33
6.64
6.80
6.55
17.52
101.67
rR
ee
rP
Coef.
-0.39
-0.09
0.17
-0.20
-0.12
0.07
0.04
0.06
0.08
0.14
0.06
0.14
0.07
iew
ev
Household size
Minority
Share of working members
Share of children
Share of women
Non-farm self employment
Wage job
Head finishing primary school
Head finishing secondary school
Head finishing high school and above
Head’s age (logarithm)
House with private bathroom/kitchen
House with shared bathroom or
kitchen
Flush toilet
Double-vault toilet
Gas cooker
Wardrobe
Fixed phone
Television
Motorbike
Video cassette
Rice cooker
Electric fan
Mobile phone
Laundry
Refrigerator/freezer
Pump
Cattle
North East
Central Highlands
South East
Mekong River Delta
Constant
Number of obs
F( 32, 2186)
Prob > F
R-squared
Fo
On
From the OLS regression, it is possible to predict household per capita income. Then by
comparing predicted per capita income with the poverty line, each household’s poverty status can
be predicted. Table 7 shows the tabulation between predicted and actual poverty status using
OLS regression and an absolute poverty line of $1.25/day. 36.8 percent of the poor and 95.7
percent of the non-poor are correctly identified using the absolute poverty line of $1.25 per day.
ly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 17 of 37
16
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 18 of 37
Table 7: Predicted and actual poverty using absolute poverty line (OLS regression)
Actual non-poor
Actual poor
Poverty accuracy
Total accuracy
BPAC
Predicted non-poor
Predicted poor
4.29
95.71
63.32
36.68
36.68
83.49
48.82
The BPAC for Method 2 is equal to 48.82, lower than the corresponding figure for Method 1.
For further comparison between Method 1 and Method 2, we estimate the probability of
households being poor from the OLS regression. The probability of a household being poor is
given as
Pi * = Φ{
ln z − X i' β
σ
Fo
) where z is the poverty line ($1.25), Φ is the cumulative standard normal
distribution and σ is the standard error of the residuals (Hentschel et al., 2000). Table 8 presents
the accuracy in identifying poverty based on the poverty line of $1.25 and the estimated poverty
probability. BPAC is maximized at the cut-off point of 0.35 (again shown in bold). At that point,
58 percent of the poor and 87.6 percent of the non-poor are correctly identified.
rP
Generally, the OLS method is quite good in identifying poverty. Another advantage of the OLS
method over the probit models is that it can predict the incomes of particular households, thus
enable calculate such income-based poverty statistics as poverty gap and poverty severity.
However, the standard errors associated with such poverty measures at the household level are
typically very large.
iew
ev
rR
ee
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
17
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Table 8: Poverty identification accuracy- OLS method
Poverty
accuracy
Non- poverty
accuracy
Total
accuracy
BPAC
97.43
93.83
88.01
81.04
74.46
65.97
57.95
50.00
43.38
36.68
30.16
24.09
18.11
13.21
8.52
5.38
2.64
0.79
0.10
30.82
47.07
58.95
69.41
77.27
82.98
87.64
91.19
93.76
95.71
97.28
98.33
99.02
99.47
99.82
99.89
99.99
100.00
100.00
44.61
56.75
64.97
71.82
76.69
79.46
81.49
82.66
83.34
83.50
83.39
82.96
82.28
81.62
80.92
80.33
79.84
79.47
79.32
-165.07
-102.81
-57.28
-17.20
12.91
34.78
52.63
33.75
10.66
-10.21
-29.25
-45.42
-60.04
-71.54
-82.26
-88.83
-94.66
-98.41
-99.81
rR
ee
rP
Cutoff
points
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
Fo
3. Method 3: Principal Component Analysis
The third method we use is principal component analysis (PCA). Principal component analysis is
a technique for reducing the information contained in a large set of variables to a smaller number.
The first principal component is the linear index of the underlying variables that captures the
most variation among them (Filmer and Pritchett, 2001). The method has been applied
extensively in the education and health literature in other countries (Filmer and Prichett, 2001;
Rutstein and Rubin, 2004) and in several unpublished papers which estimate an “asset index” for
Vietnamese households (Gwatkin et al. 2007, Chowdhuri and Baulch, 2010).
iew
ev
For the sake of simplicity, we use the same set of variables as in Method 1(Model 1) for our
PCA. Table 9 shows the factor scores associated with these variables. Generally, a variable with
a positive factor score is associated with higher socio-economic status, while a variable with a
negative factor score is associated with lower socio-economic status. Using the factor scores
from the first principal components as the weights, we then construct an asset index for each
household which has a mean equal to zero and a standard deviation equal to one. Table 10 shows
the accuracy from this method, using percentiles of asset index as cut-off points.
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 19 of 37
18
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Sahn and Stiffel, 2003;
Page 20 of 37
Table 9: Factor scores in principal component analysis (component 1)
Variable
Score
Minority
-0.194
Household size
0.032
Share of women
-0.054
Share of working members
0.155
Share of children
-0.074
Head finishing primary school
-0.052
Head finishing secondary school
0.093
Head finishing high school
0.171
Wage job
0.019
Non-farm self-employment
0.188
Semi-permanent houses
-0.025
House with shared bathroom or
kitchen
0.126
House with private bathroom/kitchen
0.202
Double-vault toilet
-0.070
Flush toilet
0.333
Radio
0.017
Electricity
0.175
Mobile phone
0.267
Refrigerator/ freezer
0.317
Pump
0.239
Fixed phone
0.346
Electric fan
0.251
Television
0.283
Video cassette
0.290
Motorbike
0.272
Eigen value of the 1st component
3.48
% of variation explained by the 1st
component
13.9
Fo
ev
rR
ee
rP
Table 10 shows that the PCA method performs less well than both the probit and the OLS
method. The optimal cut-off point is 0.25, at which BPAC is 38 and total accuracy is 80 percent.
One reason for the poor performance of PCA is that asset indices calculated by conventional
PCA incorrectly treat categorical variables as if they were continuous variables (Kolenkiv and
Angeles, 2008). Conventional PCA also does not take account of the number of each assets
which a household possesses or the ordered nature of some (e.g., housing) variables. An
alternative, more satisfactory method of estimating asset indices is polychoric PCA (Kolenikov
and Angeles, 2008), although this method is not yet widely used in practice.
iew
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
19
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 21 of 37
Oxford Development Studies
Table 10: Accuracy of the PCA method
Cut-off
points
Asset
index
Poverty
accuracy
Non-Poverty
accuracy
Total
accuracy
BPAC
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
-2.55
-2.02
-1.66
-1.36
-1.11
-0.89
-0.69
-0.49
-0.29
-0.10
0.11
0.33
0.58
0.84
1.17
1.59
2.12
2.83
3.83
14.39
26.58
37.11
46.28
54.56
61.89
68.10
73.89
78.51
82.63
86.40
89.36
92.17
94.63
96.31
97.70
98.77
99.42
99.88
97.59
94.58
91.11
87.26
83.16
78.81
74.14
69.36
64.26
59.02
53.68
48.11
42.51
36.81
30.88
24.89
18.80
12.60
6.35
79.58
79.86
79.41
78.39
76.96
75.15
72.83
70.34
67.34
64.13
60.76
57.04
53.26
49.33
45.05
40.66
36.12
31.40
26.60
-62.52
-27.23
6.39
38.65
39.06
23.34
6.45
-10.85
-29.32
-48.29
-67.61
-87.75
-108.03
-128.65
-150.08
-171.76
-193.79
-216.24
-238.87
Fo
4. Method 4: Quantile regression
rR
ee
rP
The fourth method we consider is quantile regression. This method is recommended by IRIS
Center (2008) as the most suitable method in Vietnam using a poverty cut-off corresponding to
the 50 percentile of the expenditure distribution. For comparability, we use the same set of
variables in the quantile regressions as in Model 1 of the poverty probability model and the
PCA. However, unlike the IRIS centre we ran the regression with the quantile approximating to
the $1.25/day poverty line (0.22). 7 Table 11 reports results from the quantile regression at the
22nd percentile while Table 12 shows the accuracy of the method.
iew
ev
7
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
We thank an anonymous reviewer for this suggestion. Note that we have also run this regression with the quantile
corresponding to the median, and the results are similar to those with the 22nd percentile.
20
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 22 of 37
Table 11: Quantile regression
Std. Err.
-0.08
-0.27
-0.06
0.14
0.09
0.08
-0.12
0.06
0.09
0.18
0.27
0.20
0.14
-0.09
0.06
0.13
0.05
0.23
0.17
0.06
0.14
0.08
0.14
0.09
0.16
0.11
0.15
0.19
0.29
7.74
0.00
0.02
0.02
0.02
0.00
0.00
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.03
rR
ee
rP
Coef.
t-statistic
-30.87
-12.88
-2.77
7.93
18.25
21.22
-9.53
6.71
8.86
13.06
11.52
14.01
13.52
-5.33
5.17
11.06
5.12
15.45
12.66
6.76
12.58
7.94
13.36
10.80
19.39
9.58
8.61
13.79
26.99
286.26
iew
ev
Household size
Share of children
Share of women
Share of working people
Non-farm self-employment
Wage jobs
Minority
Head finishing primary school
Head finishing secondary school
Head finishing high school and above
House with private bathroom/kitchen
House with shared bathroom or kitchen
Semi-permanent house
Electricity
Radio
Flush toilet
Double-vault toilet
Mobile telephone
Refrigerator/freezer
Pump
Fixed phone
Electric fan
Television
Video cassette
Motorbike
North East
Central Highlands
South East
Mekong River Delta
Constant
Fo
Table 12 evaluates the accuracy of the quantile regression method. With a cut-off point of 0.25,
the quantile regression method identifies 62 percent of the poor and 85 percent of the non-poor
correctly, resulting in a total accuracy of 80 percent. The BPAC for the quantile regression
method is 46.5, which is substantially lower than those for the poverty probability and OLS
models.
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
21
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Table 12: Accuracy of the quantile regression method
Cut-off Poverty
points accuracy
98.82
96.31
93.01
89.32
85.21
80.80
76.09
71.10
65.80
60.47
54.87
49.17
43.38
37.36
31.36
25.23
19.00
12.69
6.37
81.50
82.57
82.40
81.62
80.17
78.26
75.88
73.07
69.76
66.41
62.64
58.69
54.64
50.20
45.79
41.18
36.42
31.53
26.64
BPAC
ee
-58.08
-20.96
13.30
46.10
46.47
30.53
13.48
-4.57
-23.74
-43.04
-63.28
-83.93
-104.85
-126.64
-148.35
-170.55
-193.08
-215.93
-238.77
rR
18.83
32.85
44.01
53.74
61.94
69.09
75.13
80.20
84.09
87.89
90.73
93.17
95.34
96.64
98.02
98.92
99.47
99.72
99.96
rP
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
NonTotal
Poverty accuracy
accuracy
Fo
To conclude this section, we present a tabular and graphical comparison of the four poverty
proxy approaches. Table 13 compares these four approaches at their optimal cut-points. The
quantile regression approach has the highest poverty accuracy, while OLS has the highest nonpoverty accuracy. However, judged in terms of total accuracy, the OLS approach has the best
result, followed by the probit model 1. If BPAC, which is our preferred measure is used, the
Probit Model 1, Probit Model 2 and OLS produce similar results, while those for PCA and
quantile regression approaches are substantially lower. The PCA approach has both the lowest
total accuracy and BPAC.
Table 13: Comparing the accuracy of the four approaches
BPAC
Cut-off
Poverty Non-Poverty Total
accuracy
points
accuracy accuracy
0.35
59.15
86.81
80.82
52.29
0.35
53.11
87.07
79.71
53.02
57.95
54.56
61.94
87.64
83.16
85.21
81.49
76.96
80.17
52.63
39.06
46.47
ly
0.35
0.25
0.25
On
Probit: Model 1 (enlarge)
Probit: Model 2
(parsimonious)
OLS
PCA
Quantile regression
iew
ev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 23 of 37
22
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 24 of 37
Figure 3 summarizes the ROC areas under four approaches, using 20 cut-off points for each
model described above. The probit Model 1, OLS regression and the quantile regression have
very similar ROC areas, and their ROC curves are visually (and statistically) indistinguishable
from one another. This confirms these three models’ performance using the BPAC. In contrast,
probit Model 2 and the PCA method have lower ROC curves and areas, with PCA having the
lowest area under the ROC curve. This confirms the PCA methods poor performance according
to the BPAC.
Finally, we report the poverty headcount ratios, as calculated by four models at the optimal
points. Poverty rates are defined as the percentage of households who are considered poor at the
optimal cut-off points to the total population. The standard errors of the poverty rates are
calculated based on bootstrapping with 200 replications. The results are presented in Table 14.
Table 14 shows that Model 1 slightly overestimates the true poverty rate while the other models
underestimate it. The 95% confidence intervals show that the Probit Model 1 and OLS estimates
of the poverty headcount ratio are not statistically different from the “true” poverty headcount
ratio estimated directly from the VHLSS06.
Fo
iew
ev
rR
ee
rP
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
23
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Table 14: Poverty headcount ratios and standard errors the four approaches
Poverty
Bootstrapped
95% confidence
headcount standard errors
interval
Ratio
Probit: Model 1
23.14
0.50
22.28
24.00
Probit: Model 2
21.63
0.41
20.85
22.31
OLS
21.80
0.50
20.88
22.72
PCA
20.00
0.27
22.14
23.10
20.00
0.28
19.45
20.55
Quantile regression
"True" poverty headcount ratio
22.36
From this analysis, we choose the probit method with Model 1 as our preferred model, as it
performs well in terms of Total Accuracy, the BPAC, the area under the ROC curve and in
predicting the poverty headcount. In the next section, we will validate this model by testing its
robustness to different poverty lines and an alternative household datasets.
Fo
rP
Coverage of Poor (Sensitivity)
0.25
0.50
0.75
1.00
Figure 3: Areas under the ROC curve for the four approaches
0.00
0.00
iew
ev
rR
ee
0.25
0.50
0.75
Inclusion of Non-Poor (1-Specificity)
Probit Model 1: 0.8353
OLS: 0.8355
Quantile Regression: 0.8346
Probit Model 2: 0.8047
PCA: 0.7781
Reference
ly
5. Validating the Poverty Probability Method
1.00
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 25 of 37
To validate the use of the poverty probability method, we conduct three exercises: using two
different poverty lines with the same dataset (VHLSS06), and using an alternative household
24
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 26 of 37
datasets (the VHLSS04) to test its robustness. As Chen and Schreiner (2009) and others have
pointed-out, it is important to know about the out-of-sample predictive power of an approach
since an approach which identifies the poor very accurately with one dataset may perform poorly
when applied to different data.
5.1. Validating using a moderate poverty line
We tested our preferred model (Model 1, probit) with the higher international income poverty
line of $2 per capita per day, which is used to identify the moderately poor (Chen and Ravallion,
2008). . The results in Table 15 show that the model is rather good at predicting both extreme
and moderate poverty. At the cut-off point of 0.50, the model correctly identifies 75.6 percent of
the poor and 73.2 percent of the non-poor. Overall, the poverty status of 74.4 percent of all
households is correctly identified, while the BPAC is relatively high at 72.4.
Table 15: Accuracy of Poverty Probability Method with $2/day Poverty Line
Total
accuracy
55.36
59.00
62.10
64.78
67.50
69.62
71.61
72.96
74.05
74.35
74.09
73.28
71.91
69.85
67.30
64.46
60.98
57.61
54.17
BPAC
rR
9.95
18.25
25.63
32.65
40.08
46.75
53.76
60.02
66.47
72.42
61.26
42.89
23.46
3.85
-16.01
-34.18
-53.20
-69.64
-85.38
iew
ev
Non-poverty
accuracy
12.31
20.38
27.58
34.41
41.65
48.15
54.97
61.07
67.35
73.14
78.48
83.38
87.88
91.53
94.64
96.79
98.39
99.28
99.86
ee
Poverty
accuracy
99.56
98.66
97.54
95.98
94.04
91.68
88.69
85.17
80.93
75.60
69.58
62.91
55.51
47.58
39.24
31.26
22.57
14.81
7.24
rP
Cut-off
points
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
Fo
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
25
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 27 of 37
Oxford Development Studies
1.2. Validating using a consumption-based poverty line
The next step is using a different definition of poverty based on consumption expenditure. We
use the ‘official’ poverty line of the General Statistics Office, which the per capita expenditure
needed to obtain 2,100 Kcal per person per day plus a modest allowance for non-food
expenditures. Table 16 shows the results. At the optimal cut-off point of 0.40, the model can
correctly specify the expenditure-based poverty status of 86.5 percent of all households,
including 65.2 percent of the poor and 91.7 percent of the non-poor. Comparing Table 16
(poverty based on consumption) with Table 5 (poverty based on income), it appears that
household asset and socio-economic status are more closely related to consumption than to
income.
Table 16: Accuracy of Poverty Probability Method using expenditure-based poverty line
Total
accuracy
63.96
71.93
77.07
80.54
82.92
84.04
85.39
86.50
86.82
87.27
87.24
86.92
86.53
85.76
85.32
84.75
83.76
82.69
81.81
-80.74
-37.16
-6.40
16.38
33.28
44.86
56.36
64.17
45.38
28.06
13.30
-1.83
-15.51
-29.35
-41.02
-49.97
-61.83
-73.83
-83.85
rR
iew
2. Validating using the VHLSS 2004
BPAC
ev
Non-poverty
accuracy
55.71
66.39
73.93
79.51
83.65
86.49
89.31
91.72
93.53
95.31
96.49
97.46
98.26
98.75
99.33
99.59
99.74
99.83
99.92
ee
Poverty
accuracy
97.60
94.55
89.88
84.78
79.92
74.05
69.39
65.19
59.48
54.46
49.50
43.90
38.69
32.77
28.13
24.18
18.55
12.73
7.92
rP
Cut-off
points
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
Fo
¶
3
In the final step of validation, we test the poverty probability model using data for rural areas
from the Vietnam Household Living Standards Survey (VHLSS) of 2004, a comparable
nationally representative household survey conducted in 2004. The VHLSS 2004’s sample size
includes 46,000 households (of which expenditure data were collected for 9,300 households). We
used the coefficients obtained from estimating the Probit Model 1 using the VHLSS 2006 and
“exported” these to the VHLSS 2004, where the same set of variables were available.
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
The results from our validation exercise are presented in Table 18. At the cut-off point of 0.25,
79.2 percent of all households are correctly specified according to their income poverty status (at
26
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 28 of 37
$1.25 per head), including 52.8 percent of the poor and 86.9 percent of the non-poor. The BPAC
is 50.4We also test the model with the moderate international poverty line of $2 per capita in
Table 19. The results show that the model performs well. At the cut-off point of 0.4, 70.9 percent
of all households are correctly classified, including 75.5 percent of the poor and 65.8 percent of
the non-poor. The BPAC is high at 69.3. Table 18: Accuracy of Poverty Probability Method
using VHLSS 2004 and $1.25/day line
Cut-off Poverty
Non-poor Total
BPAC
points
accuracy accuracy accuracy
0.05
91.32
43.31
54.17
-93.87
0.10
81.48
61.41
65.95
-31.97
0.15
71.86
72.88
72.65
7.27
0.20
61.71
81.55
77.06
36.92
0.25
52.79
86.89
79.18
50.40
0.30
43.86
90.91
80.27
18.80
0.35
37.25
93.90
81.08
-4.66
0.40
30.38
95.55
80.81
-24.02
0.45
23.86
97.01
80.46
-42.07
0.50
18.24
98.08
80.01
-56.94
0.55
14.41
98.78
79.69
-67.00
0.60
10.70
99.38
79.31
-76.47
0.65
7.32
99.75
78.84
-84.51
0.70
5.05
99.86
78.41
-89.43
0.75
2.72
99.91
77.92
-94.24
0.80
1.26
99.92
77.60
-97.21
0.85
0.60
100.00
77.51
-98.79
0.90
0.42
100.00
77.47
-99.16
0.95 .
.
.
.
Fo
rR
ee
rP
Table 19: Accuracy of Poverty Probability Method using VHLSS 2004 and $2/day line
Cut-off
points
56.00
59.82
63.08
65.76
67.98
69.99
70.85
70.89
71.45
70.26
68.50
66.54
16.89
25.37
33.60
41.37
48.94
56.75
63.58
69.27
64.00
45.17
25.35
5.28
ly
7.38
16.83
25.99
34.66
43.10
51.80
59.41
65.75
73.13
78.45
83.33
88.02
On
99.62
98.40
96.36
93.67
90.31
86.32
81.10
75.50
69.94
62.92
55.20
47.27
iew
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
NonTotal
BPAC
Poverty
accuracy
accuracy poor
accuracy
ev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
27
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
4
3
.
¶
Oxford Development Studies
0.65
0.70
0.75
0.80
0.85
0.90
0.95
40.01
32.55
24.70
18.00
11.71
6.61
2.45
91.65
94.46
96.24
97.61
98.88
99.73
100.00
64.43
61.83
58.53
55.65
52.94
50.65
48.58
-12.50
-29.93
-47.23
-61.86
-75.59
-86.53
-95.10
VI. Conclusions
Recognising the difficulties involved collecting comprehensive household expenditure and
income data for sub-populations of interest, this paper has explored four ‘short-cut’ methods for
predicting a household’s monetary poverty status using data from rural Vietnam. These are the
poverty probability method (probit model), OLS and quantile regressions and asset indices
constructed using principal components analysis. As shown in Table 11 and Figure 3 above, the
poverty probability method is found to be the most accurate method for predicting poverty using
a nationally representative survey for 2006. The poverty probability method allows around fourfifths of poor and non-poor to be accurately identified using this data.
Fo
rP
We then verified our preferred method using different poverty lines and data from a previous
national survey (in 2004). The poverty probability model performs robustly across alternative
poverty lines and data sets, accurately identifying between 74% and 87% of the poor and nonpoor.
ee
Furthermore, our empirical results show that variables with the strongest correlation to poverty
are household size and household composition, minority variable, head education, housing type
and ownership of radio, mobile telephone, refrigerator, television and motorbike. A checklist for
collecting these variables from households is provided in Appendix A2, while a set of Excel
spreadsheets for implementing the poverty probability method’s calculations are available by
writing to the corresponding author. While further testing of this method is definitely required,
initial field testing in Hoa Binh and Ha Giang provinces indicates that it is possible to collect the
checklist information in a 10 to 15 minute interview with each household. Further research is,
however, needed to establish the recommended minimum sample size and sampling protocols to
use when applying the method. Initial simulations produced by bootstrapping the VHLSS06
indicate that sample sizes of around 200 households are needed to measure the poverty
headcount with a 10 percent margin of error (see Appendix A.4)
iew
ev
rR
Several caveats regarding the use of the poverty probability method should be noted. First, the
method’s focus on identifying monetary poverty in rural areas deserves reiterating. While it
would be challenging to extend this method to non-monetary poverty measures, it would be
relatively simple to extend it to urban areas or, indeed, other countries—though some additional
variables (e.g., ownership or air conditioners or motor cars in urban Vietnam) would be needed
and different coefficients would need to be estimated. Second, while the method has high total
accuracy, it is only able to correctly identify 78 to 81 percent of the poor and non-poor. If it used
as to identify whether individual households are poor or non-poor, errors of targeting (both
under-coverage of the poor and inclusion of the non-poor) are bound to occur. When used on
larger samples, the full model tends to slightly overestimate the true poverty rate, while the more
parsimonious model tend to underestimate it. . Third, the poverty probability method is unlikely
to be a good way of detecting changes in poverty over periods of a few years. Careful attention
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 29 of 37
28
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 30 of 37
should be given to the standard errors of the poverty rates produced, which as mentioned above
are quite wide. It would also be useful to investigate how estimated coefficients of the underlying
model change over time, which is possible in Vietnam due to the biennual frequency of its
national household surveys. Finally, further field testing of the poverty proxy checklist and the
Excel worksheets which accompany it are needed before the method can be firmly recommended
for ex ante and ex post poverty impact work.
Fo
iew
ev
rR
ee
rP
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
29
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
References
Alkire, S. and M.E. Santos (2010) Acute multidimensional poverty: a new index for developing
countries, Human Development Research Paper 2010/11, New York: United Nations
Development Program.
Baulch, B. (2002) Poverty monitoring and targeting using ROC curves: Examples from Vietnam,
IDS Working Paper No. 161, http://www.ids.ac.uk/ids/bookshop/wp/wp161.pdf.
Chen, S. and M. Ravallion (2008) The developing world is poorer than we thought, but no less
successful in the fight against poverty, Policy Research Working Paper Series 4703, World
Bank, Washington, DC.
Chen, S. and M. Schreiner (2009) A simple poverty scorecard for Vietnam, Progress Out of
Poverty, Grameen Foundation. http://www.microfinance.com/#Vietnam.
Chowdhuri. R. and Baulch, B. (2010) Should PI use an asset based approach for its poverty
analysis?, Mimeo, Prosperity Initiative, Hanoi
Fo
Filmer, D. and L. Pritchett (2001) Estimating wealth effects without income or
expenditure data -- or tears: an application to educational enrollments in states of India,
Demography 38(1), pp. 115-132
Gwatkin, D., S. Rutstein, K. Johnson, E. Suliman, A. Wagstaff and A. Amouzou. (2007) Socioeconomic differences in health, nutrition, and population: Vietnam, Country Reports on HNP and
Poverty, Washington, D.C.; World Bank,
http://siteresources.worldbank.org/INTPAH/Resources/400378-178119743396/vietnam.pdf.
rP
IRIS Center (2007) Client assessment survey—Vietnam, online at
http://www.povertytools.org/USAID_documents/Tools/Current_Tools/USAID_PAT_VIET_72007.xls.
ee
IRIS Center (2008) Accuracy results for 20 poverty assessment tool countries, online at
http://www.povertytools.org/other_documents/PAT_20_country_accuracy_results_Dec2008.pdf.
rR
Kolenikov, S. and G. Angeles (2008), Socioeconomic status measurement with discrete proxy
variables: is principal components analysis a reliable answer?, Review of Income and Wealth,
55(1), pp. 128-165.
ev
Nguyen, Linh (2007) Identifying poverty predictors using household living standards surveys in
Viet Nam, in G. Sugiyarto (ed.) Poverty Impact Analysis Selected Tools and Applications, Asian
Development Bank, Manila, Philippines.
iew
Ravallion, M., S. Chen and P. Sangraula (2008) Dollar a day revisited, Policy Research Working
Paper Series 4620, World Bank., Washington, DC.
Rustein, S. and Johnson, K. (2004) The DHS Wealth Index, DHS Comparative Reports 6,
Calverton: ORC Macro
Sahn, D. and D. Stifel. (2003) Exploring alternative measures of welfare in the absence of
expenditure data, Review of Income and Wealth, 49(4), pp. 463–489.
On
Wodon, Q. (1997) Targeting the poor using ROC curves, World Development, 25(12), pp. 20832092.
ly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 31 of 37
30
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 32 of 37
Appendices
A1. Comparison of poverty/asset indicators used by different studies in Vietnam
IRIS
Household characteristics
Composition
Household size
Number of children
Number of women
% of dependents
% of working age members
% of working in agriculture
Head
Head’s age
Head’s marital status
Head ethnicity
Education
Head's education
Spouse’s education
Number of adults with no
education
Occupation
Agriculture activities
Wage activities
Non-farm activities
Crop activities
Agricultural services
Accommodation and land
Type of house
Type of roof
Type of toilet
Type of floor
Source of lighting
Main cooking fuel
Source of drinking water
Living area
Number of rooms occupied
Number of people per
bedroom
Land area
Land rented out
Sahn &
Stifel
Baulch
Gwatkin
et al.
Chen &
Schreiner
√
This
paper
Linh N.
√
√
√
√
√
√
√
√
√
Fo
√
√
rP
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
On
√
√
√
iew
√
√
√
√
√
ev
√
√
√
rR
√
√
√
√
ee
√
√
ly
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
31
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Assets and durables goods
Television
Refrigerator
Motorcycle and/or car
Radio
Cookers (or stoves)
Bicycle
Motor scooter
Boat
Washing machine
Video cassette
Fixed telephone
Mobile telephone
Ploughing machines
Sewing machine
Wardrobe
Mill
Garden
Electric fan
Pump
# of chickens owned
Geographic Region
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
Fo
√
√
√
rP
√
√
√
√
√
iew
ev
rR
ee
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 33 of 37
32
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Page 34 of 37
Appendix A2: A Poverty Proxy Checklist for Rural Vietnam
(Expanded Module)
Household ID:
Date of interview:
Household head's name:
Village:
_ _ / _ _ / _ _ _ _ Length of Interview:
Interviewer's name:
Commune:
District:
1
2
3
4
iew
ev
ly
On
10
11
12
13
14
15
16
17
18
Please write 1 if the answer is YES, 0 if the answer is NO
Does the household’s head belong to an ethnic minority (not Kinh or Hoa)?
What is the highest education level completed by the household's head
A. Less than primary
B. Primary
C. Secondary
D. High school or above
What type is the household's main residence?
A. Villa or private house
B. House with a shared kitchen or bathroom/toilet
C. Semi-permanent house
D. Makeshift or other
Is electricity used as the main lighting in the household?
What type of toilet arrangement does the household have?
A. Flush toilet or sulabh toilet *
B. Double vault compost latrine or toilet directly over the water
C. No toilet or others
Does the household have a radio or radio cassette player?
Does the household have a motorbike?
Does the household have a fixed telephone?
Does the household have a mobile telephone?
Does the household have a television?
Does the household have a refrigerator/freezer?
Does the household have a video cassette?
Does the household have an electric fan?
Does the household have a pump?
rR
8
9
Please put numbers to answers
How many people are there living in your household?
How many household members…
are 14 years old or younger?
are from 15 to 59 year years old?
How many household members are female?
In the past 12 months, how many household members
work for wages/salaries
are self-employed
ee
7
Province:
rP
5
6
minutes
Fo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Oxford Development Studies
*Note: Sulabh toilets (h xí th m d i nư c) are latrines with open bottoms, which disintegrate stools by water pouring and
absorbing.
33
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Appendix A3: A Poverty Proxy Checklist for Rural Vietnam
(Concise Module)
Household ID:
Date of interview:
Household head's name:
Village:
_ _ / _ _ / _ _ _ _ Length of Interview:
Interviewer's name:
Commune:
District:
1
2
3
4
5
Province:
Please put numbers to answers
How many people are there living in your household?
How many household members are 14 years old or younger?
Please write 1 if the answer is YES, 0 if the answer is NO
Does the household’s head belong to an ethnic minority (not Kinh or Hoa)?
Does the household's head have high school degree or above?
What type is the household's main residence?
A. Villa or private house
B. House with a shared kitchen or bathroom/toilet
C. Semi-permanent house
D. Makeshift or other
Does the household have a flush toilet or sulabh toilet? *
Does the household have a motorbike?
Does the household have a mobile telephone?
Does the household have a television?
Does the household have an electric fan?
rR
ee
rP
6
7
8
9
10
minutes
Fo
*Note: Sulabh toilets (h xí th m d i nư c) are latrines with open bottoms, which disintegrate stools by water pouring and
absorbing.
iew
ev
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 35 of 37
34
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Page 36 of 37
A4. Sample Size Simulations
A question arising in the poverty proxy checklist method is a suitable sample size to estimate poverty.
To check this, we implemented a bootstrapping simulation based on a subset of VHLSS 2006, which
include two provinces in North-Western Vietnam which are of particular interest to PI: Thanh Hoa
and Hoa Binh. This subset of the VHLSS06 includes 1620 households
In the simulation, we drew n number of households from the data, and estimated poverty rate based on
the subsamples, with 500 replications for each approach. We use the standard error ratio, that is the
standard error of the poverty rate estimated by each of the four approach expressed as a percentage of
“true” poverty rate, to determine the extend of error
The results in Table A4.1 show that if we draw less than 12% of the sample (200 households), the
standard error ratio as percentage of true poverty rate is about 10.2%. If we want to achieve less than
5% standard error ratio, the sample size must be above 50% of the whole sample.
Fo
Table A4.1: Comparing sensitivity of poverty estimates to sample sizes by different approach
Standard Error Ratio (%)
Quantile
Probit 1
OLS
PCA
regression
52.19
47.97
54.26
47.05
43.12
43.62
50.59
41.90
32.34
34.69
42.52
30.81
23.28
25.77
30.3
21.68
19.56
21.48
23.27
18.14
16.51
19.95
21.06
15.55
15.08
16.69
19.04
14.12
12.07
13.06
16.07
11.21
10.19
11.19
13.7
9.42
9.28
10.09
12.46
8.48
8.54
9.17
10.99
7.76
7.43
7.76
9.78
6.65
6.62
6.92
8.5
5.95
5.39
5.58
7.34
4.76
4.57
4.87
6.36
4.05
3.6
3.91
5.23
3.27
iew
ev
rR
ee
Sample
Size
(households)
5
10
20
40
60
80
100
150
200
250
300
400
500
750
1000
1500
rP
As shown in Table A4.1 below, the standard error ratio for each of the four poverty proxy approaches
falls dramatically until sample sizes of around 60 households are reached. Thereafter, although the
standard error ratio continues to decline it does so at a declining rate.
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
35
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]
Oxford Development Studies
Figure A4.1: Comparing sensitivity to sample sizes by approach
Fo
iew
ev
rR
ee
rP
ly
On
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Page 37 of 37
36
URL: http:/mc.manuscriptcentral.com/cods Email:
[email protected]