CSH-IFP Working Papers
USR 3330
“Savoirs et Mondes Indiens”
Inequality of Opportunity
in Indian Society
Arnaud Lefranc and Tista Kundu
Institut Français de Pondichéry
Pondicherry
Centre de Sciences Humaines
New Delhi
14
The Institut Français de Pondichéry and the Centre de Sciences Humaines, New Delhi
together form the research unit USR 3330 “Savoirs et Mondes Indiens” of the CNRS.
Institut Français de Pondichéry (French Institute of Pondicherry): Created in 1955 under the
terms agreed to in the Treaty of Cession between the Indian and French governments, the
IFP (UMIFRE 21 CNRS- MAE) is a research centre under the joint authority of the
French Ministry of Foreign Affairs (MAE) and the French National Centre for Scientific
Research (CNRS). It fulfills its mission of research, expertise and training in human and
social sciences and ecology, in South and South-East Asia. Major research works focus on
Indian cultural knowledge and heritage (Sanskrit language and literature, history of
religions, Tamil studies etc.), contemporary social dynamics (in the areas of health,
economics and environment) and the natural ecosystems of South India (sustainable
management of biodiversity).
Institut Français de Pondichéry, 11, Saint Louis Street, P.B. 33, Pondicherry-605 001, India
Tel: (91 413) 2231609, E-mail:
[email protected]
Website: http://www.ifpindia.org/
Centre de Sciences Humaines (Centre for Social Sciences and Humanities): Created in 1990,
the CSH (UMIFRE 20 CNRS- MAE) is a research centre jointly managed by the French
Ministry of Foreign Affairs (MAE) and the French National Centre for Scientific Research
(CNRS). Conveniently located in the heart of New Delhi, the Centre produces research in
all fields of social sciences and humanities on issues of importance for India and South
Asia. The main themes studied by CSH researchers include territorial and urban dynamics,
politics and social changes, economic growth and inequalities, globalization, migration and
health.
Centre de Sciences Humaines, 2, Dr. Abdul Kalam Road, New Delhi-110 011, India
Tel: (91 11) 3041 0070, E-mail:
[email protected]
Website: http://www.csh-delhi.com/
© Institut Français de Pondichéry, 2020
© Centre de Sciences Humaines, 2020
CSH-IFP Working Papers - 14
Inequality of Opportunity in Indian Society
Arnaud Lefranc and Tista Kundu
2020
Institut Français de Pondichéry
Centre de Sciences Humaines
Inequality of Opportunity in Indian Society∗
Arnaud Lefranc†and Tista Kundu‡
March 4, 2020
Submitted for considering as a CSH-IFP working paper.
Abstract
Recent debates on distributive justice have started to prioritize inequality of opportunities, that
is exclusively generated from circumstance factors beyond individual control. Using data from the
National Sample Survey we estimate inequality of opportunity for India in consumption expenditure
and wage earning, on the basis of caste, sex, region and parental backgrounds as our circumstances.
Adopting the widely used methods of non-parametric and parametric analysis, we find that even in
2011-12, more than one-third of the total wage inequality can be attributed to the differences in the
ascribed social positions of an individual. Inequality of opportunity in consumption on the other
hand is relatively low. Furthermore, we used the regression tree algorithm to find the hierarchical
order among the circumstances and construct the opportunity tree for India, that the previous
methods are unable to provide. In the fashion of machine learning, the opportunity tree identifies
parental background as one of the most important circumstance factor behind the underlying unequal
opportunity in the country, for either outcomes. But the effect of casteism is prominent as well, that
in interaction with region, affirms a forward caste premium for most part of the country, particularly
for the regular salaried wage earners.
JEL Classification
Keywords
:
D31, J71, O12
: Caste, Inequality of opportunity, Mean Log Deviation,
Multiple imputation, Parental background, Regression tree
∗
The author thanks Daniel Mahler and Geoffrey Teyssier for their technical help as well as suggestive comments. We are
grateful to Cristina Terra, Marta Mènendez, Nicolas Gravel for their helpful feedback. Comments from seminar participants
at ESSEC Business School (Paris), Laboratoire THEMA (University of Cergy-Pontoise), Winter School on Inequality and
Social Welfare Theory (IT13, University of Verona), University of Calcutta, Centre for Studies in Social Science Calcutta,
Indian Statistical Institute (Kolkata), 15th Growth and Development conference (ISI, Delhi), Centre de Sciences Humaines
(New Delhi), are also gratefully acknowledged.
†
Université de Cergy-Pontoise, France
‡
Corresponding author. Centre de Sciences Humaines, New Delhi, India. Contact:
[email protected]
1
1
Introduction
“..The service to India means the service of the millions who suffer. It means the ending of
poverty and ignorance and decease and inequality of opportunity.” - Jawaharlal Nehru1
Seventy years have passed after this speech is made at the stroke of midnight on the very
first day of independence of India. Over this span, India from an impoverished country, made
her journey to one of the emerging global economy now. Especially since the late nineties,
with a consistent high GDP growth rate of more than 7%, India has now become the sixth
largest economy in the world. Much work has been accomplished with significant improvement
in overall well-being of the country, but much enough, if not more, remains to be done or even
addressed. Numerous studies have showed that the rapid growth of India has been accompanied
by increasing inequality as well. However very few studies have yet been done to explore how
much of the growing inequality is due to inequality of opportunity, that is how much of this high
inequality is generated by factors that are purely fatalistic and therefore beyond any human
control.
India followed an interventionist central planning for the first forty years after independence
followed by ‘neo-liberal’ economic reforms at the beginning of 1990s. Since then, both the overall
growth rate and inequality in India grew almost simultaneously, making it a very relevant and
active area of research concerning India. A sharp increase in consumption inequality along with
a slower pace of poverty reduction has almost become a distinct feature of the Indian economy,
especially in the twenty-first century2 . But for a very stratified society like India, while there are
wealth of literature on analyzing the problem of inequality, linking it to social mobility, labor
market discrimination, urbanization or poverty, only a handful of them analyze how much of this
inequality is due to unequal opportunities arising from varying social and family backgrounds,
for which no one can be held accounted for.
The present work aspires to quantify the degree of unequal opportunity in India by estimating how much of inequality in consumption and wage is due to differences in caste, sex, region,
parental education and occupation. Traditionally inequality had been assessed following a welfarist approach, where inequality in the final outcome was the main focus of analysis. Unequal
distribution of any desirable outcome (e.g. income, education, standard of living, health etc.)
are of primary concern for assessing social welfare. However inequality can arise from an array of
different factors, some of which are purely fatalistic to the individuals. This heterogeneity in the
inequality generating factors had actually triggered a philosophical debate in the late twentieth
century, criticizing the fact that the classical welfarist way of inequality analysis is an approach
too consequentialist to take into account the multifaceted nature of the inequality generating
process (Rawls 1971, Dworkin 1981b,a). The main point of the debate is that inequality arising
from factors on which no individual has any control, like race, sex, ethnicity, religion, birthplace,
parental and family background, should be of primary concern from an ethical standpoint and
should therefore be considered as rather unfair. On the other hand inequality generated from
1
Excerpt from ‘Tryst with Destiny’ - a speech delivered on the first day of independence, 15th August 1947,
by Jawaharlal Nehru, the first Prime Minister of independent India.
2
See Deaton & Dreze (2002), Himanshu (2007), Dev & Ravi (2007), for example. For the recent updates on
Indian inequality, see India inequality report by Himanshu (2018).
1
unregulated lifestyle, lack of perseverance, inadequate skill formation or poor managing ability,
in other words, factors for which one can arguably be held responsible for, are not unethical and
unfair in an egalitarian society.
This new approach of analyzing inequality by splitting it into fair and unfair part, brings
about the question of individual responsibility in the domain of distributive justice and started
to prioritize the analysis of inequality arising solely from the factors that are beyond subjective responsibility (Arneson 1989, Cohen 1989). Inspired by this philosophical debate on the
responsibility sensitive egalitarian justice, Roemer (1993) formulates inequality of opportunity
as that part of inequality that is generated by factors beyond any individual control. In the
jargon of inequality of opportunity (IOP), all such factors that are outside the periphery of
individual responsibility but are responsible for generating inequality, are called circumstances.
On the other hand the inequality generating factors that the individual can presumably control,
are called efforts. In this dichotomous standpoint of effort versus circumstances, inequality of
opportunity is that (unfair) part of inequality that had been generated only by the circumstance
factors (Roemer 1998).
Methodologically both non-parametric and parametric approaches serve the literature to estimate the measure of IOP in a society. The backbone structure of these methods attributes to
Checchi & Peragine (2010) (for Italy) and Bourguignon, Ferreira & Menéndez (2007) (for Brazil),
respectively for the non-parametric and the parametric estimates. Although the parametric estimates of IOP comes at the cost of a specific functional form assumption between the outcome
and the circumstance variables, it is often recommended for studies with a broad range of circumstances. Whereas the use of non-parametric approach is more common for multi-country
comparison studies that is limited to a comparable set of circumstances across the countries.
So far in the literature there is no universal consensus to prioritize one approach over another.
But in either set up to quantify the unfair part of inequality as a measure of IOP, majority
of the literature use an index from the generalized entropy class of inequality indices, that of
the index of mean log deviation. Using a slightly different non-parametric and parametric set
up, Ferreira & Gignoux (2011) nevertheless showed that the estimates of IOP are significantly
close regardless of the method adopted. This is the methodological set up that we will use for
measuring the index of IOP in India3 .
Two of the major shortcomings of the above mentioned approaches are that they are based
on a pre-specified number of circumstances and often uses all possible interactions of the chosen circumstances to estimate IOP, while in reality only some of the interactions may be most
contributive ones. However there is no way that either of the non-parametric or the parametric
set-up can point out the relevant interactions. Besides including all possible interactions also increases the total number of circumstance groups to compare, which may lead to an overestimated
IOP as the number of observations per cell decreases. To address this problem, Brunori, Hufe &
3
See Roemer & Trannoy (2013), Ramos & Van de Gaer (2012) for an extensive analysis on the major methodologies used in the literature. For some international estimates of IOP, see Brunori, Ferreira & Peragine (2013)
(selected developed countries including some Nordic countries, selected Latin American, African, Middle-East and
Asian countries), Ferreira & Gignoux (2011) (Latin American countries), Marrero & Rodrı̀guez (2011) (United
States of America), Checchi, Peragine & Serlenga (2010) (European countries), Cogneau & Mesplè-Somps (2008)
(African countries).
2
Mahler (2018) introduced a novel approach of analyzing IOP using the regression tree analysis
that let the algorithm choose the most relevant circumstances in a statistically significant way
from the submitted set of circumstances and generates a visually interpretative opportunity
tree in the hierarchical order of circumstances. Therefore along with the non-parametric and
parametric estimation of IOP, we adopt this approach for the present work as well to provide
the opportunity structure for contemporary India.
India epitomizes a very hierarchical social structure historically, where the century old caste
system is functional even in the twenty-first century. For such a stratified country there is almost no work analyzing unequal opportunity in India, with two notable exceptions. Using the
National Sample Survey data, Asadullah & Yalonetzky (2012) analyzed educational opportunity in different states of India due to differences in sex, religion and caste. However for being
a state-level study it is naturally focused more on the inter-state differences in terms of unequal opportunity in education rather than the national estimate. Besides due to the structure
of the data base, their study can not take into account parental background as one of their
circumstances which is repeatedly shown as one of the major driving factor behind unequal
opportunities in a number of developed and developing countries. With a different survey Singh
(2012) gives a national estimate of IOP in India for consumption and income, that includes
father’s educational and occupational background as two of the major circumstances. But due
to the survey structure, the inclusion of parental background limits this study to Indian men
only. Besides none of the above studies gives the recent picture of India, as the latest time
frame in either work is 2004-05. The scanty work on IOP in India leaves significant scope of
further improvement. The aim of the present paper is to provide the latest estimates of IOP
in India using both the non-parametric and parametric methodology, as well as to provide the
opportunity structure for contemporary India adopting the recently introduced approach of the
regression tree analysis.
In particular we choose two outcome variables to analyze, namely, consumption expenditure
and wage earning, and analyze IOP for a set of five circumstance variables comprising of caste,
sex, region, parental education and occupation. The present work contributes to the literature in
several ways. First, using the latest employment unemployment survey database of the National
Sample Survey Organization, our study gives a rather recent picture of unequal opportunity in
India during 2011-12. We found that even by 2012, more than one-third of the total wage
inequality is due to differences in the taken circumstances. This positions India as one of the
high opportunity unequal countries in the global perspective. Second, due to the structure of the
National Sample Survey it is difficult to incorporate parental information into the analysis, as
the survey questionnaire have no direct provision of reporting this information. Instead parental
attributes are only available for the co-resident households where parents are enumerated along
with their offspring. This immediately raises the question of selection bias due to co-residence.
The present study overcome this problem by imputing information on parental background
for the general sample by the widely used technique of multiple imputation (Rubin 1986). We
thereby produce the estimates of IOP by taking into account the important circumstances of
parental backgrounds but without limiting the study to the co-resident households. In fact we
found that ignoring parental backgrounds as circumstances, results in considerable underestima3
tion of IOP, as the loss in information due to omitting parental attributes can not be captured
well by the other social circumstantial backgrounds considered, like caste, sex or region. Besides,
in spite of the prevalent evidence of casteism in India, differences only on the basis of caste groups
is found to be not enough to capture the differences in economic opportunity arising from other
sources like family backgrounds. The opportunity tree reconfirms the paramount importance of
parental education and occupation. Both consumption and earning opportunity are better for
someone having at least one parent with above primary level schooling experience and father
engaged in non-agricultural occupation. This is the third contribution of our paper, that is to
show how our circumstances are intertwined in generating unequal opportunity in the society.
Rest of the paper is organized as follows. Section 2 sketch out the methodological framework of
the non-parametric, parametric and the regression tree approach. Section 3 introduces our data
and a clear clarification of all our variables, along with details on our sample selection criteria.
Section 4 describes our results in different subsections. After discussing the main non-parametric
and parametric measures of IOP in India, we give a brief account on the relative importance
of caste and other social backgrounds, in comparison with the parental backgrounds. The
opportunity structure for contemporary India is discussed next, separately for the two outcome
variables. Section 5 concludes.
2
Theoretical and methodological background
In the analysis of inequality of opportunity, any social outcome is supposed to be generated by
two broad classes of factors. Factors that are beyond individual responsibility or circumstances
(C) and factors that are within individual control or efforts (e). Therefore, borrowing from
Ferreira & Peragine (2015), the simplified outcome generating process can be written as y
f pC, eq
(1)
Such that the outcome to be analyzed, y, can be determined from a finite set of circumstances,
C, and efforts, e. From the standpoint of responsibility sensitive egalitarianism any outcome
inequality generated by C is ethically objectionable, whereas inequality arising from e can be
considered legitimate4 .
Any analysis of IOP therefore begins with the clear classification of the circumstance and the
effort variables. However there are no fixed list of circumstance or effort variables to be taken
into account, as they are subject to data availability and are rather determined in the social or
political space that varies between different societies (Roemer & Trannoy 2013). Nevertheless as
common to any empirical exercise, estimates of IOP crucially depends on the data structure and
partial observabiltiy of circumstance or effort factors severely limit the study. Data availability
on effort factors in particular, are even more limited for a large number of surveys. However IOP
is the amount of inequality generated by circumstances only and efforts, the so called legitimate
source of inequality, can itself be determined by the existing social circumstances. Hence effort
variables themselves are often assumed to be a function of circumstances, so that the outcome
4
Lefranc et al. (2009) introduced a third factor, that of luck, in the study of IOP, which we did not consider
in the present work.
4
generating process in equation (1) can actually be reformulated as a reduced form equation,
y g pC q, where outcome is a function of circumstances only (Ferreira & Gignoux 2011). Of
course higher the number of circumstances taken into account, more realistic is the measure of
IOP. But with addition of new circumstances into the analysis IOP will always increase as long
as the added circumstances are not orthogonal to the outcome in concern. Since it is impossible
for any survey to provide a complete exhaustive list of circumstances, Ferreira & Gignoux (2011)
therefore advice to interpret any resulting estimates of IOP as the lower bound of the true IOP
in the society.
Unlike the traditional inequality approach, social welfare in the responsibility sensitive domain
is not judged on the basis of total inequality in the outcome variable, I ty u. Rather IOP is the
measure of only that part of outcome inequality that is generated by the circumstance factors,
C, exclusively. So the main methodological challenge for quantifying IOP is to quarantine
this unfair part of outcome inequality. This is usually done in the literature by constructing
suitable counterfactual distributions, y CF , such that by construction, y CF is able capture the
variability in the outcome arising uniquely from the differences in the circumstance variables,
C. The measure of absolute IOP in the society can then be measured by the inequality in
the counterfactual distribution, I ty CF u. However since IOP is estimated as that part of total
inequality which is unfair and morally objectionable, it is a common practice in the literature
to provide the estimates of relative IOP as the share of unfair inequality in the total outcome
inequality by I ty CF u{I ty u. The construction of the counterfactual distributions and hence the
measurement of IOP, varies with the non-parametric or the parametric statistical model of
analysis as discussed below.
2.1
Non-parametric approach
The non-parametric method for the present analysis have been adopted from the work of Ferreira
& Gignoux (2011). Consider a finite population set, i P t1, ..., N u, characterized by tyi , Ci u,
standing for outcome and circumstances respectively. Assume that the vector Ci consists of J
elements and each of the element can take xj number of values or categories. Usually groups
formed by all possible interactions of the circumstances are called types. In this framework, the
population under study can thus be partitioned into a maximum of K̄
¹ x , exhaustive and
J
j
j 1
mutually exclusive types.
From the viewpoint of IOP any inequality between types is ‘unfair’. To isolate this unfair
inequality each of the k types are represented by a ‘smoothed distribution’ of their respective
mean outcomes. Thus every individual in a type, i P tk, k 1, . . . , K̄ u, are assumed to be
characterized by the type-mean outcome, µk , for each k 1, . . . , K̄. Therefore the counterfactual
distribution to quarantine the inequality generated exclusively from the differences in types, is
represented by, y CF tµ1 , . . . , µK̄ u. The absolute and relative measure of IOP can then be
estimated as5 θaN P I ptµk uq
(2)
5
NP stands for Non-parametric, r for relative measure and a for absolute measure
5
θrN P
k
IIptptµy uquq
(3)
i
Where I ptxuq denotes inequality in the distribution of x. Following the extant literature, I pq is
measured by the index of Mean Log Deviation (MLD)6 .
2.2
Parametric approach
The parametric approach in the present work has been adopted from Ferreira & Gignoux (2011)
as well, which also essentially estimates IOP by the mean outcome conditional on types by the
OLS estimates, but differs from the non-parametric set up in its construction of the counterfactual distribution to isolate the ethically unfair part of inequality.
The parametric set up usually assumes a log-linear relationship between the outcome and the
circumstance/effort variables. So the income generating process can be written as ln yi
αCi
βei
ui
(4)
However, as mentioned before, the effort factors can fairly be assumed as a function of circumstances as below ei γCi vi
(5)
with ui and vi being the random errors.
Hence, from the structural model (4) and (5), the reduced form income generating process
can be summarized as ln yi
αCi
pα
ΨCi
β pγCi
βγ qCi
vi q
pβvi
ui
ui q
εi
(6)
From the OLS estimates of equation (6), Ψ̂, IOP is then measured in comparison to a hypothesized distribution, ty˜i u, that eliminates any differences in individual circumstances, as ỹi
exprC̄iΨ̂
εˆi s
(7)
where, C̄i is the mean of circumstance variables across the population. Thus equation (7)
eliminates the differences in circumstances by replacing them with their mean values and the
associated inequality, I pty˜i uq, is therefore segregated as fair, by construction. The measure of
absolute IOP can then eventually be estimated from the counterfactual distribution, y CF
ptyiu ty˜iuq, that isolates the outcome variations generated from the differences in individual
circumstances only. So the relative share of IOP in the total inequality is given by7 θrP
6
7
°
I ptyiIuqpty Iuqpty˜iuq
i
x̄
MLD(x)= N1 N
1 ln x
Where P and r in the superscript and subscript stand for parametric and relative measure respectively.
6
(8)
Similar to the non-parametric approach, we use the same index of MLD for the parametric
estimates of IOP as well.
2.3
Regression tree approach
Circumstances by definition are all possible factors that are beyond individual responsibility
and it is physically impossible for any data set to capture all such factors under a single or
multiple survey. Research on IOP is therefore always restricted to a subset of the total set of
circumstances. But as long as the omitted circumstances have non-trivial effect in predicting the
outcome variable, addition of each such circumstance will increase the estimate of IOP by virtue
of finer partitioning of the population. Clearly higher the circumstances taken into account,
more realistic is the estimate of IOP. However addition of new circumstances also comes at
a cost. Since this finer sample partitioning leaves fewer observations for each type, there is a
chance of overestimating IOP. The regression tree analysis coined in the literature by Brunori
et al. (2018), makes an attempt to allay this issue in the fashion of machine learning.
Once again assume that for individual i, the circumstance vector, Ci consists of J elements,
Ci P tCi1 , . . . , CiJ u, each of which can take xj number of values, where j P t1, . . . , J u. Unrestricted partitioning will then divide the population into, K̄ Jj1 xj , number of types,
considering all possible interactions among the circumstances. However for a large number of Ci
and/or xj variables, observations in all or some of the cells in K̄ may get too crunched to allow
the researcher to use all available types, especially when sample size is relatively less. Besides in
case of unrealistic vacuous interactions, some cells may suffer from no observations at all. Since
there is no way to point out the relevant interactions in either of the non-parametric or the
parametric modeling, the conventional resort is either to regroup the circumstances in broader
categories (less xj ) or sacrificing some of the circumstances (less Ci ) or both. In the regression
tree approach instead, the researcher submits the full set of available circumstances, Ci , to the
program and let the algorithm choose the relevant partitioning of the sample under study in a
non-arbitrary way, by recursive binary splitting to be precise.
±
The recursive binary splitting is a type of permutation test, because it rearranges the labels
on the observed data set multiple times and computes test statistic (and p-value) for each
of this rearrangement. It starts by dividing the full sample into two distinct groups based
on one circumstance factor and then continue the same for each split, potentially based on
another circumstance, into more subgroups and so on. The criteria for the selection of splitting
circumstances depends on the type of regression tree used. Brunori et al. (2018) uses the
conditional inference tree algorithm to determine the splitting criteria as follows.
The algorithm runs in two stages as • Stage - I: Selecting the initial splitting circumstance
– It starts with the simultaneous testing of the J partial hypothesis, H0C : DpY |C j q
DpY q for j P t1, . . . , J u. Notice, this precisely is the testing of the existence of IOP,
to see if any circumstances have any effect on the outcome.
j
7
j
– Adjusted p-values, pC
adj , are then computed with the standard adjustment for multiple hypothesis testing8 and identifies the circumstance, C , with the highest degree
of association, that is, the circumstance with the minimum p-value, C tC j :
j 9
argmin pC
adj u .
– The algorithm stops if the p-value associated to C is greater than some pre-specified
significance level, α10 . Hence, if pC
adj ¡ α, the null of equality of opportunity for the
society, can not be rejected at α% level of significance. Otherwise, the circumstance,
C , is selected as the initial splitting variable.
• Stage - II: Growing the opportunity tree
– Once C is selected, it is split by the binary split criterion to grow the tree. For
each possible binary partition, s, involving C , the entire sample can be split into
two distinct parts as, Ys tYi : Ci xj u and Ys tYi : Ci ¥ xj u.
– For each binary split, s, the goodness of split is tested by testing the discrepancy
between Ys and Ys 11 . The split, s , with the maximum discrepancy, that is with
the minimum p-value, is then selected as the optimum binary split point, based on
which the sample is now partitioned into two sub-samples, constructing the initial
two branch of the opportunity tree.
– The entire algorithm is then repeated for each branch separately, to construct the full
opportunity tree.
3
Data, variables and sample selection
3.1
Data
For the present analysis of inequality of opportunity in India we have taken data from the
National Sample Survey (NSS). This is the biggest nationally representative micro level database
for India, collected by the National Sample Survey Organization (NSSO), India. Among the
many national level surveys conducted by NSSO we have taken the Employment Unemployment
Survey in particular. This survey is conducted for a year in every five years, covering the whole
country except some remote inaccessible area12 . For focusing on the recent scenario in India, we
have taken the latest employment-unemployment survey round of NSS conducted in the year
2011-1213 .
These round surveys 100000 households enumerating about 0.4 million individuals. India is
predominantly rural even to date with a rural-urban ratio around 70 : 30 on average. Initially
C
The adjustment is the Bonferroni correction, pC
q (Brunori et al. 2018, p. 8).
adj 1 p1 p
To test the association between the outcome variable and the covariates, the linear statistics form, along with
its mean and variance, is provided in Hothorn, Hornik & Zeileis (2006), where from the relevant test statistic and
p-value can be formulated.
10
Like Brunori et al. (2018) we also choose α 0.01
11
This is tested by the two sample test statistics, provided in Hothorn et al. (2006). The entire algorithm can
be executed by an R package, developed by the same authors.
12
So conflict areas of Ladakh & Kargil districts of Jammu & Kashmir, some remote interior villages of Nagaland,
few unreachable areas of Andaman & Nicobar Islands and those villages recorded as uninhabited by respective
population census, are kept out of these surveys.
13
This means we have taken Schedule 10 survey of NSS, for round 68 (2011-12).
8
j
j
9
8
we have to drop about 1000 observations to clean for valid age, sex, sector, caste specification,
marital status and some other criterion. NSS provides details on several household and individual characteristics. Some of the major household provisions include household size, religion,
caste and consumption expenditure, whereas age, sex, education, occupation and many other
demographic characteristics are recorded for each member of the household. However not everybody reported as ‘employed’ do have information on their income, rather wage earning is
selectively reported in the NSS data only for the regular and the casual wage earners who are
not self-employed. Another possible drawback in the structure of NSS data base is that it does
not report information on parental background directly for every individual. Rather this crucial
information is only available for households where the offspring is enumerated along with his/her
parents.
3.2
3.2.1
Definition of variables
Circumstance variables
For the present analysis, we have chosen a set of five circumstance factors, that of caste, sex,
region of residence, parental education and father’s occupation. We can label the first three
of them as social backgrounds, while parental education and occupation constitute parental
backgrounds. With all possible interactions of these five circumstance variables, we have a total
of 324 types.
Caste system in India is a century old hierarchical social structure based on occupation. However the historical occupational perspective in its way became hereditary over time and children
always inherit the caste of their father that is unchangeable for lifetime. There are thousands
of castes in the country, which are regrouped in fewer caste categories by the constitution of
India for the purpose of caste based affirmative policy or reservations. We consider three caste
categories in our analysis. The lower caste category consists of the Scheduled Castes and the
Scheduled Tribes caste categories together (SC/ST ). They are the most historically disadvantageous caste groups in India and are designated the reservation status since 1950. Around
mid-eighties, the socially and economically backward castes among the non-SC/ST s are further
categorized as the Other Backward Classes (OBC ) who are entitled to certain reservation quotas
in higher education and Government jobs since the beginning of nineties. Indian nationals do
not belong to any of the above mentioned caste categories are formally called as the General category individuals and are excluded from any caste based affirmative policies by rule. OBC s can
be thought of as the middle level caste category who are usually little more advantageous than
the historically disadvantageous caste categories of SC/ST, but have lesser economic advantage
as compared to the forward General caste category.
Considering the bulk of literature on gender discrimination in India, we take two categories
of sex, male and female, as our next social circumstance. To consider region as one of our
circumstances, we have to take region of residence, although the ideal circumstance factor would
be the birth region. Due to unavailability of information on birth place, we have to consider
the present residing region as a proxy for birth region, which is not a far fetched assumption
given the low rate of inter-state and inter-district migration in India as per the recent migration
survey report of NSSO (2008). To further minimize migration related contamination, we take
9
six broad regional categories for our analysis as - North, East, Central, North-East, South and
West14 .
Our next batch of circumstances consists of parental background that includes two kind of
parental attributes, that of parental education and occupation. By combining father’s and
mother’s education, we take three categories of parental education as - (i) both parents have no
formal schooling (ii) at least one parent has primary or below primary schooling (that means the
other parent, either have the same level of schooling or less) and (iii) at least one parent has above
primary schooling. It is worth a mention, that ‘no formal schooling’ is not equivalent to illiterate
parents, as they may have exposed to other informal adult literacy programs, but have never
experienced formal schooling. Due to considerably low information on mother’s occupation, we
took three categories of father’s occupation as a proxy of parental occupation. The occupational
categories are taken as father’s employment in - (i) white collar job (ii) blue collar job and (iii)
agricultural occupation. White collar job category includes all sorts of professional, executive
and managerial jobs. Whereas, sales and service workers falls in the domain of blue collar
workers. Agricultural job includes horticulture, fishing and hunter-gatherers as well.
3.2.2
Outcome variables
The analysis of IOP on India is executed for two different outcome variables - consumption and
wage. Both are continuous variables and expressed in Indian Rupee (INR).
Consumption is considered as the monthly per capita consumption expenditure (MPCE).
This is the total monthly expenditure on certain durable and non-durable goods incurred by
the household over the last thirty days prior to the date of the survey. This data is therefore
reported at the household level, which we divide by the respective household size to get the
individual level values. The list of goods, expenditure on which is to be reported is a selection of
goods that has been considered as the most important ones by the respective survey. Borrowing
from Hnatkovska, Lahiri & Paul (2012), we use the real MPCE as our outcome variable, upon
dividing MPCE by the state level absolute poverty lines15 .
Our second outcome variable is the wage earning, which is reported only for the regular and
casual wage earners. Therefore the wage data is not available for a large chunk of self-employed
individuals who constitute nearly 40% of the working adults in India. Unlike MPCE, wage is
reported as the weekly wage received or receivable for multiple activities, by each regular/casual
earning members of the household over the last week prior to survey. The main reason for
reporting wage as an weekly input is that many of the Government or non-Government public
work programs in India are transitory in nature that employ a huge number of rural casual
laborers. However we consider wage corresponding to the major activity that had been pursued
for the maximum number of days over the reference week. In case of equal number of days
14
Statewise composition: Jammu & Kashmir, Himachal Pradesh, Punjab, Haryana and Uttarakhand - North;
Bihar, Jharkhand, Orissa, West Bengal - East; Uttar Pradesh, Rajasthan, Madhya Pradesh, Chattisgarh - Central ;
Sikkim, Arunachal Pradesh, Assam, Nagaland, Meghalaya, Manipur, Mizoram, Tripura - North-East; Karnataka,
Andhra Pradesh, Tamilnadu, Pondichery, Kerala, Lakshadeep - South and Gujrat, Daman & Diu, Dadra & Nagar
Haveli, Maharashtra, Goa - West.
15
We use poverty lines, that can account for the differences in standard of living across the states of India.
Besides, the measure of absolute poverty line is provided by the Planning Commission of India using data collected
by the same survey, that of the National Sample Survey, the one we use for the present analysis.
10
spent on more than one activity, we prioritize those having valid wage entry and occupation
information. In particular we consider the daily real wage earning as our outcome variable by
dividing total weekly wage by the number of days engaged in that major activity. Similar to
MPCE, the corresponding real wages are generated upon division by the state level absolute
poverty lines.
3.3
Sample selection
As mentioned before, NSS does not provide information on parental attributes for every individual, making this data limited to the co-resident households that consists of both offspring
and parents as the respondents. Provided the instrumental role of parental backgrounds in the
analysis of unequal opportunities for a number of countries, the study on India will remain incomplete had we not consider that. Therefore given the data structure, the biggest challenge in
the sample selection procedure is how to best incorporate the valuable information of parental
backgrounds in our analysis of IOP in India.
Studies for which parental information may be important, like the analysis of inter-generational
mobility or inequality of opportunity, when using the NSS data base, usually deal with this issue
either by restricting their analysis to the co-resident households (e.g. Hnatkovska, Lahiri & Paul
(2013) for inter-generational mobility analysis) or by sacrificing the parental background data
(e.g. Asadullah & Yalonetzky (2012) for educational opportunity analysis). As mentioned before
we already ruled out the second option considering the importance of parental attributes in IOP.
However to analyze IOP we want our sample to be restricted to working adults who have reportedly finished their education. Provided that, the other option to include parental attributes
is to limit our analysis to households with adult inter-generational co-residence. Although adult
parent-child co-residence is not an uncommon social pattern in India, it may raise the issue of
selectivity bias. So to provide estimates of IOP in India with a nationally representative sample,
we impute the parental attributes for our sample using the technique of multiple imputation.
Our sample therefore consists of working adults who are aged between 18 to 45 years, are not
currently enrolled in any educational institution, are from male-headed households (who also
are the only head of the household) and have valid information on education and occupation,
both for themselves and for their parents16 . However for estimating IOP in wage, we further
restrict our sample to those who additionally provide valid data on wage.
The theory of multiple imputation was introduced by Rubin (1976, 1986) for dealing with the
problem of missing data due to non-response in large survey data sets. Although mostly popular
in the statistical and medical research, the use of multiple imputation to handle missing values
is expanding in economics as well, especially in the survey data based econometric analysis17 .
In particular, Teyssier (2017) showed the efficacy of multiple imputation for imputing parental
information for a data set on Brazil, for which this information is also available without the
co-residence issue. We want to impute two parental attributes in particular, that of parental
16
We exclude multi-headed and female headed households in India, as they are rare and subject to special
constraints. Over 90% of heads are male and 99% households are single-headed-household.
17
For application of multiple imputation technique in poverty and inequality analysis, see Alon (2009), JongSung & Khagram (2005), for example. Whereas, Salehi-Isfahani, Hassine & Assaad (2014), Teyssier (2017),
provide estimates of IOP using multiply imputed circumstances.
11
education and father’s occupation, both of which are considered as categorical variables in our
estimation of IOP.
We first form our sample as per the sample selection criteria mentioned above, except the
criteria related to parental attributes. We can now think of this sample as the union of two
exhaustive and mutually exclusive parts - the ‘response’ and the ‘non-response’ part. While
the ‘response’ part have valid information on parental background, this crucial information is
missing for the other part. The exercise of multiple imputation is to use information from
the ‘response’ part to impute values for the ‘non-response’ part, using all possible auxiliary
information provided by the data set that are non-missing for both of the ‘response’ and the
‘non-response’ part. In our case the ‘response’ part consists of the co-resident data points for
which parental background is observed18 . Table 8 in Appendix A reports the summary statistics
of the ‘response’ and the ‘non-response’ sub-samples. It shows that co-residence does not seem
to make a marked difference in terms of caste, occupation and rural-urban composition. But
notice that the samples of the ‘response’ part, as expected, are relatively younger. Hence is
the justification of taking relatively younger adults (18-45 years) for our analysis, so as to keep
parity between the ‘response’ and the ‘non-response’ part.
The two parental variables in concern, that of the parental education and father’s occupation
are then estimated for the ‘response’ part by a suitable imputation model (an ordered logistic regression, in our case), using a broad range of predictors including households, individuals
and some survey related characteristic variables that are strictly non-missing for both the ‘response’ and the ‘non-response’ part19 . Parental attributes for the ‘non-response’ part is then
imputed from simulated draws of the posterior distribution of these estimates. However as the
name suggests, the imputation of the missing values is done for a multiple number of times
generating multiple number of ‘completed’ data-sets, where none of the attributes are missing
any longer. We adopt the sequential regression multiple imputation algorithm of Raghunathan,
Lepkowski, Van Hoewyk & Solenberger (2001) and use 20 imputations in particular. Both the
non-parametric and parametric measures of IOP are then analyzed separately over each of the
‘completed’ data-set and combined by Rubin’s rule (Rubin 1986) to give the final measures of
IOP.
However the exercise of multiple imputation does not mean to ‘create’ the missing values in a
deterministic fashion, but rather to capture the additional features of the ‘response’ part to use
it in the final analysis. Therefore two of the important criteria for a successful imputation are,
that the imputation model should provide good estimates of the missing parental attributes from
18
In particular we consider our ‘response’ part to constitute of samples who are living with their parents, with
father as the household head. However a co-resident household may consist of other members with information
on parents as well. Two cases in particular are excluded. First we did not take grandchildren of the household
head for simplicity. Secondly, households where the adult working child share the headship and is living with
one of his/her parents should also be taken into account, but could not be, because in this case NSSO reports
father/mother/father-in-law/mother-in-law by a single code, making it impossible to extract information on biological parents. However these two cases together do not exclude more than 10% of the sample, as far as adults
are considered.
19
This includes some household characteristics like household size, caste, sector (rural/urban), religion, consumption expenditure and offspring’s’ characteristics like their age, relation to head, marital status, region of
residence, sex, occupation, education, along with some other survey-specific attributes. Further details of our
imputation model are provided in Appendix A.
12
a bunch of non-missing variables and that the relation between them remain the same for the
‘non-response’ part as well. While the former can be tested by the imputation model diagnostics,
given the data-set the latter can at best be reasonably assumed (Marchenko & Eddings 2011).
In particular the second criteria of a good imputation requires that the probability of the missing
data does not depend on any unobservable factor and hence can be imputed successfully from
the imputation model (Allison 2000). Our imputation exercise and eventually the measures of
IOP also bank on this assumption, which is the so called assumption of ‘missing at random’
(MAR)20 . Summary statistics of our sample, as well as our sub-sample for the wage analysis
(wage sample), is given in Table 1.
Working sample
2011-12
[68]
Wage sample
2011-12
[68]
age
hhsize
%rural
%married
%noschool
%agri
%wage
N
32.74
(0.05)
5.0
(0.01)
0.72
(0.00)
.82
(0.00)
.24
(0.00)
.45
(0.00)
.48
90574
32.24
(0.07)
4.7
(0.01)
.65
(0.00)
.79
(0.00)
.24
(0.00)
.33
(0.01)
1.0
41619
Table 1: Work sample summary statistics
a
a
standard errors are in parentheses and round number in squared bracket. ‘age’ and ‘hhsize’ reports the mean
age and household size of our sample. %rural, %married, %noschool, %agri and %wage reports the share of rural
sample, married individuals, samples without any formal schooling, samples engaged in agricultural jobs and
samples who further have the information on wage data, respectively. The last column (N) reports the respective
sample size.
Table 1 reports the mean age, household size (hhsize), share of rural sector, share of married
samples, share of individuals without any formal schooling (noschool) and share of population
engaged in agriculture (agri) in our working sample along with the respective sample size. First
of all, similar to the general picture of the whole country, our sample is predominantly rural with
a substantial population in agricultural occupations. However even in 2012, nearly one-fourth
of our sample have no experience of formal schooling ever. The last but one column reports
the percentage share of our working sample to have information on wage data. It shows that
more than half of our working sample do not have information on wage data, which explains the
massive reduction of sample size for our wage sample. The lower panel of Table 1 shows that
the regular and casual wage earners are usually less rural and less agricultural.
Table 2 gives the circumstance specific composition for each of our five circumstance variables
(caste, sex, region, parental education, father’s occupation). Due to low female labor force
participation, notice that both of our working and wage sample are rather male dominated
20
Since we can never actually test whether the missing-ness depend on some unobservable factor not provided
by the data-set, we have to assume MAR. However, since adult inter-generational co-residence is the rather
prevalent social pattern for most part of India, it is quite reasonable to assume that parental attributes does not
depend on some hidden unobservable factors beyond the provision of the survey. Another assumption that of
‘missing completely at random’ (MCAR) is also mentioned in the literature, which assumes that the probability
of missing-ness is random. This is rarely the case for any survey data and so for NSS, because co-residence is
clearly more probable for younger males and less for females (for female migration due to marriage). However a
number of literature suggests that the assumption of MAR is good enough for a reasonable imputation (Rubin
1976, Little 1988, Allison 2000, Raghunathan et al. 2001). Appendix A provides further details of our imputation
algorithm and diagnostics.
13
circumstances
Ñ
share of Ñ
Working sample
2011-12
Wage sample
2011-12
Caste
Sex
Region
north
Parental
education
no schooling
Father’s
occupation
agriculture
SC/ST
male
29.7%
82.1%
6.9%
46.9%
54.3%
34.1%
83.9%
7.8%
46.4%
48.1%
Table 2: Circumstance specific summary statisticsa
a
Each column shows the percentage share of our samples who are - SC/ST, males, residents of Northern region,
have both parents without any formal schooling and have agricultural fathers, respectively.
with even higher proportion of males in the wage sample21 . Besides as per with the national
population distribution, Northern India has relatively less number of samples. Although our
wage sample has relatively more lower caste individuals, caste composition for either of our
sample is close to the national proportion. Nearly 30% of our sample are from the destitute
caste groups of SC/ST which is similar to the caste proportions in the country as a whole.
About 46-47% of both of our working and wage sample have neither parents with any formal
schooling experience. Besides most of the samples are from agricultural households where fathers
are engaged in agro-based occupations.
4
Results and discussion
4.1
Measures of IOP in India
To quantify the degree of unequal opportunity in Indian society for consumption and wage,
we adopt both the non-parametric and the parametric approaches for at least two good reasons. First, it will serve as a robustness check to our measures of IOP. With the same set of
circumstances, the amount of unfair inequality should not have much variation under the nonparametric and the parametric set up. Second, most of the international measures of IOP have
used either or both of these methods. Estimating IOP for India under both the approaches
will therefore be helpful for international comparisons. Following the extant literature, both
inequality and IOP are always measured by the index of mean log deviation. Besides both the
non-parametric and the parametric measures of IOP are based on all possible interaction of the
full set of circumstances, viz. caste, sex, region, parental education and occupation, leaving us
a total of 324 types to compare22 .
Table 3 reports the relative IOP as well as the measure of total inequality, for MPCE (consumption) and wage. The first row reports the amount of total inequality measured by the MLD,
for each of the outcome variables separately. At par with the recent trend in Indian economy
that shows a very sharp increase in consumption inequality, we also find a little higher value of
MLD for MPCE as compared to casual/regular wage. But wage outweighs MPCE by a very
large extent when the variable of interest is IOP and not the total inequality.
21
In the chosen age group (18-45 yrs.), about 30% females are currently employed, while more than 65% are
reported as not in labor force for attending domestic duties during 2011-12.
22
The 324 types correspond to the interaction of - caste(3)sex(2)region(6)parental education(3)father’s
occupation(3), where number of categories for each circumstances are in parentheses.
14
Survey year
Inequality
Measures of relative IOP
Non-parametric
Parametric
MPCE
2011-12
0.28527
Wage
2011-12
0.25101
0.11172
0.10661
0.39310
0.37747
Table 3: Measures of Inequality of opportunity in Indiaa
a
All IOP measures are the relative measures of IOP and therefore reports the percentage share of IOP in the
total inequality upon multiplied by 100. So the non-parametric estimation of IOP in wage for 2011-12 reflects
that 39.3% of wage inequality is due to unequal circumstances in that survey year.
The last two rows of Table 3 reports the non-parametric and the parametric measures of
relative IOP respectively, using all possible interaction of the chosen circumstances. So the nonparametric IOP for wage says that 39.3% of the total wage inequality is due to differences in the
chosen set of circumstances during the survey year of 2011-12 and therefore strictly unfair from
an ethical perspective. Similar to Ferreira & Gignoux (2011), we also found the non-parametric
measures for each outcome to be always little higher than the corresponding parametric measures
of IOP. However for all the respective outcome variables, the measures of IOP are close-by under
both of the statistical set-ups (non-parametric and parametric), indicating that our results are
actually robust to the method adopted.
Among the two outcome variables considered, Table 3 shows that the share of ethically unfair
inequality is relatively low for MPCE. About 11% of consumption inequality is due to unequal
opportunities arising from the differences in the chosen circumstances. The degree of consumption IOP in India is still a bit higher than most of the developed countries and in fact positions
India closer to the Sub-Saharan African countries (Cogneau & Mesplè-Somps 2008). The same
can not be said for wage though. During 2011-12, about 37-39% of wage inequality in India is
conditioned by unequal social and parental backgrounds. At least in terms of wage IOP with a
comparable set of circumstances, India seems worse than Brazil that has found to be as one of
the most opportunity unequal country in Latin America (Ferreira & Gignoux 2011).
Although Consumption and wage are often analyzed side by side in many of the development
studies as two comparable source of standard of living, this is not the case for the present
analysis. This is because NSS data does not report these two variables in a comparable format
and we can point out at least three major sources of variation in the reporting of the consumption
and the wage data in our data base. First of all, MPCE is a household level data reported as
the total expenditure of the household and is therefore unable to capture any intra-household
differences. Wage on the other hand is likely to be rather varying in nature, as it is reported not
only for every regular/casual earning members of the household but also for multiple number
of activities. Second, MPCE is recorded for a larger recall period of a month. Whereas due to
the transitory nature of many casual wage earning jobs, wage is reported for the reference week
prior to the date of the survey. Together a shorter recall period along with a finer reporting unit
makes the wage data to be more variant and responsive to changes in the individual circumstance
factors. Finally, wage and consumption are estimated for different samples and the same set of
circumstances may have a differentiated effect for different samples. In particular a large body
15
of self-employed individuals are excluded exclusively from the wage analysis.
4.2
Effect of caste in comparison with parental background
India is one of the very few countries where the century old caste system is well embedded even
to date. The origin of the caste system was found in the ancient Hindu text, where the society
was divided in hierarchical occupational structure. Upper castes are supposed to be engaged
in occupations that are more pure in nature like worshiping deities or serving the country as
soldiers. Whereas the major occupation of the lower caste categories is to serve the upper caste
‘masters’. Caste in its way became hereditary and is identified at birth that is not convertible
for lifetime. Although that makes caste a classic circumstance factor in the context of IOP, it is
certainly not the only source of hierarchy in the Indian society and may have its effect through
many channels. The purpose of the present section is not to explore these different channels,
rather to show the relative importance of caste as a circumstance factor as compared to parental
background and other social backgrounds, in the context of estimating IOP for India.
Table 4 reports the non-parametric relative measures of IOP with different set of circumstances. The first row gives the non-parametric IOP with the full set of circumstances and is
therefore the same as the non-parametric measures in Table 3. From the second row onward we
provide the associated estimates of IOP after omitting one or more of the circumstances from
our analysis. Measures corresponding to the second row reports the index of non-parametric relative IOP after caste is omitted from our set of circumstances. Similarly the third row estimates
IOP without taking any parental attributes (parental education and father’s occupation) as our
circumstances and the last row reports the same when all circumstances other than caste are
omitted from the analysis. However unless the omitted circumstances are completely orthogonal
to the outcome in concern, IOP will always increase with the addition of new circumstances. It
is the reason why Ferreira & Gignoux (2011) suggested to interpret the resulting IOP estimates
as a lower bound of the true IOP in the society because no study can ever take into account the
complete exhaustive set of circumstances. Therefore as expected, IOP mostly decreases as we
move down in Table 4 from more to lesser number of circumstances.
caste+sex+region+parental backgrounds
sex+region+parental backgrounds
caste+sex+region
caste
Relative IOP
MPCE Wage
0.112
0.393
0.099
0.363
0.047
0.161
0.014
0.079
Table 4: Effect of omitted circumstances in the measure of IOPa
a
‘Parental background’ is abbreviated to indicate circumstances related to parents and therefore includes
parental education and father’s occupation. The measures of IOP are the non-parametric relative estimates.
Notice that as compared to the first row with full set of circumstances, IOP decreases both
for the second and the third row of Table 4, but it is the latter for which the fall in the value
of IOP is larger. Even after omitting caste, earning IOP in India is over 36% and consumption
IOP too decreases marginally. On the other hand after omitting parental backgrounds from
16
the analysis, only about 16% of the total inequality is deemed unfair for the presence of IOP
in wage earning. For either outcomes, IOP more than doubled when parental background is
considered as additional circumstances along with the social backgrounds (caste, sex, region),
whereas it decreases marginally when only caste is omitted from the analysis. This implies
that the omitted effect of caste can be captured to a large extent by the other social and
parental attributes considered. But even after controlling for caste, sex and region, differences
in parental background have non-trivial additional effect in generating unequal opportunities
for all the outcome variables. Hence is the necessity of multiple imputation of information on
parental backgrounds, as the social attributes alone are not sufficient to take into account the
discriminatory effect of parental backgrounds.
In fact with caste as the only circumstance variable, IOP in India is even lesser than some of
the developed countries. However a comparison in this regard is not really appropriate as most
of the international studies on quantifying IOP involves at least one circumstance regarding
parental information. Nevertheless the low estimates of IOP for the last row of Table 4 does not
indicate that caste has no role to play in generating unequal opportunities in the Indian society,
rather it is indicative of the fact that caste alone can not capture well the differences in other
circumstances especially that of parental backgrounds.
4.3
Opportunity tree for contemporary India
Either of the non-parametric or the parametric approach uses a fixed model specification for
analyzing IOP, where all the circumstances are given equal importance while estimating the
resulting measures of IOP in India. However it is possible that caste may matter more in some
part of the country with certain family backgrounds or earning opportunity is always less with
lesser educated parents but even more when father is an agricultural worker. Neither of the nonparametric or the parametric measures have an answer to this question in the context of IOP.
So to investigate the intertwining of our circumstances we adopt the regression tree approach
that has been recently introduced in the literature by Brunori et al. (2018).
Because of our data structure we have to impute the information on parental backgrounds
throughout our analysis. Although we computed the non-parametric and parametric estimates
on multiply imputed data set for more precision, it is difficult to perform the same for the
regression tree analysis as far as the drawing of opportunity tree is concerned. Since each
imputed data set may generate slightly different opportunity trees depending on the imputed
values of parental education and occupation, the interpretation of the multiple opportunity trees
for a single outcome variable becomes rather complicated. We therefore pick a randomly chosen
imputed data set and draw the opportunity tree for that single imputed data-set, separately for
each of our outcome variables.
All the opportunity trees are drawn on the basis of the same set of circumstances as they
are considered for the non-parametric and parametric analysis. So the opportunity tree for all
outcome variables are therefore drawn on the basis of - (i) three categories of caste - General
[Gen], Other Backward Classes [OBC] and Scheduled Castes/Scheduled Tribes [SCST] (ii) two
categories of sex - male [M] and female [F] (iii) six categories of region - North [N], East [E],
Central [C], North-East [NE], South [S], West [W] (iv) three categories of parental education 17
none of the parents have any formal schooling [No], at least one have below primary schooling
(considered as medium education) [Med] and at least one of them have above primary schooling
(considered as high education) [High] (v) three categories of father’s occupation - white collar
[WC], blue collar [BC] and agriculture [Agr], where abbreviations in the square brackets are
used to label the corresponding categories in the opportunity trees (Figures 1, 2).
We submit this full set of circumstances to the program and let the algorithm choose the
most relevant ones to draw out the opportunity tree, where the initial node represents the most
important circumstance for the respective outcome. Unlike the non-parametric and parametric
approaches, types in the regression tree are not all possible combination of the circumstances,
rather each terminal node of the tree now correspond to a different type and is represented by
the mean outcome of that type. IOP is then measured as the inequality between these typemean outcomes. The major difference with the non-parametric and parametric analysis is that
the regression tree traces out the most important interactions among the circumstances in a
statistically significant way and estimates IOP only on the basis of those limited number of
interactions which are chosen by the program as the most relevant ones. The opportunity tree is
therefore able to produce an estimate of IOP that escapes the possible risk of over-fitting arising
from unregulated number of interactions. Indeed during 2011-12, Table 5 shows that IOP in
consumption is less than 7% and the same for wage is about 32%, when it is estimated using
the regression tree algorithm23 .
MPCE
Wage
Measures of relative IOP
Regression tree Parametric Non-parametric
0.068
0.107
0.112
0.318
0.377
0.393
Table 5: Different estimations of IOP (2011-12)a
a
All IOP estimates are measured by the index of mean log deviation on multiply imputed data-sets.
The opportunity trees for MPCE and wage are presented in Figures 1 and 2, respectively. First
of all, some common patterns across both of the outcome variables are immediately noticeable.
For both MPCE and wage, parental background has turned out to be the most important
circumstance followed by the region of residence. However, while parental education is the most
determining circumstance for generating unequal earning opportunity, it is father’s occupation
that is the crucial one for MPCE. Also with some exceptions, the role of other social backgrounds
of caste and sex becomes relevant at a later stage for either outcome. Although whenever they
matters, females and relatively backward caste categories are mostly on the back foot. The
only exception is the case of North-East India where the deprived castes of SC/STs have better
earning opportunity than their upper caste peers, as reflected by Figure 2. This actually brings
out the special feature of the tribal hub of the North-East region that embodies the highest
concentration of SC/ST in the country.
23
Notice that although we draw the respective opportunity trees on the basis of a randomly chosen single
imputed data-set, the same is not done for quantifying IOP under the regression tree approach. Similar to the
non-parametric and parametric analysis, IOP is measured in the regression tree analysis using all the 20 imputed
data-sets and by the index of mean log deviation.
18
Although geographical region of residence (zone) is pointed out as important in MPCE as
it is in the wage earning, the advantageous group in aspect to this particular circumstance
differs across the outcomes. There is relatively lesser consumption opportunity for working
Indian adults who are the residents of East and Central regions, even more so if they are
from the lower caste categories. However as far as earning opportunity is concerned, non-self
employed wage earners living in the North-Eastern part are actually better off than the rest of
the country and even more so if they are from the destitute caste groups of SC/ST belonging
to a non-agricultural family. Although having the largest concentration of SC/ST (particularly
ST) may impart different caste dynamics in North-East, this may not be representative of the
overall national scenario as the wage analysis is only limited to the non-self-employed workers
comprising of both regular and casual workers. The common feature across these workers are
that either of them are paid by an external agent, but while regular workers are paid a regular
monthly salary, casual workers get paid on transient public work based projects. Therefore this
does not include a big portion of self-employed workers and hence a significant portion of SC/ST
who are living their livelihood on farming or gathering in their own land are out of the wage
analysis. Further since 1950, SC/STs are benefited from a caste based reservation quota for
most of the regular jobs and the forward general caste people are not. Provided their higher
concentration, this may contribute to the better earning opportunity of SC/ST in this region as
compared to the upper castes there.
19
20
‘n’ and ‘y’ denote the sample size and the mean MPCE in INR (Indian Rupee), respectively, for the corresponding terminal node. Parent edu, Father occu and zone represent
the circumstances of parental education, father’s occupation and region of residence, respectively.
a
Figure 1: MPCE (2011-12 )a
21
‘n’ and ‘y’ denote the sample size and the mean (daily) wage in INR (Indian Rupee), respectively, for the corresponding terminal node. Parent edu, Father occu and zone
represent the circumstances of parental education, father’s occupation and region of residence, respectively.
a
Figure 2: Wage (2011-12 )a
5
Concluding remarks
In this paper we estimate the amount of IOP for India in consumption expenditure and wage
earning, using the latest employment unemployment survey of NSS for the year 2011-12. We
consider a set of five circumstance factors comprising of caste, sex, region, parental education and
father’s occupation. Using the most widely used methodologies in estimating IOP, we found that
39% of wage inequality is due to unequal opportunities that comes from belonging to different
caste, sex, region or parental backgrounds on which nobody has any control. This is higher than
some of the most opportunity unequal countries in Latin America. However due to the selective
reporting of wage data in NSS, our wage analysis is limited to the non-self-employed regular or
casual workers of the country and excludes a substantial portion of self-employed working adults.
On the other hand, both of the non-parametric and parametric methods estimate that the share
of unfair inequality in consumption is around 11%. But consumption for being reported as the
total monthly household consumption expenditure, may not be well responsive to changes in
the individual circumstances and thereby has a chance to be underestimated.
Due to the structure of NSS, information on parental attributes is provided for the ‘co-resident’
households only where the adult working child is enumerated along with his/her parents living
in the same house. So to incorporate parental background information we adopt the statistical
technique of multiple imputation and is therefore able to provide estimates of IOP in India
neither by restricting our sample to the selected households with adult intergenerational coresidence, nor by sacrificing the most important circumstance variables of parental backgrounds
from the entire analysis. The other social circumstances like caste, sex and region on the
other hand, are non-missing for the entire sample. We further found that the degree of IOP is
substantially underestimated if parental backgrounds are omitted from the set of circumstances,
whereas this is not the case when caste is omitted. In fact IOP in India is estimated even lower
than some of the developed countries while taking the social circumstances alone (caste, sex,
region). In addition we also found that in spite of numerous evidence on caste discrimination in
the Indian society, taking caste as the only circumstance factor is not enough as far as quantifying
IOP is concerned. The hierarchical division of caste is therefore not able to capture well the
differences in other omitted circumstances, especially that of parental backgrounds.
Similar to the extant literature, both of our non-parametric and parametric measures of
IOP are based on all possible interactions of the circumstances, while in reality some of them
may be more relevant. To explore the intertwining of our circumstances we further provide
the opportunity structure for India using the recently introduced approach of the regression
tree analysis. We found parental education to be the most important circumstance for wage,
whereas it is the occupational category of father that seems the most important source of unequal
opportunity in consumption. Irrespective of the outcomes, individuals from agricultural family
backgrounds however, are always worse off. Although in most of the cases, the social backgrounds
of caste or sex come at a later stage in the circumstance hierarchy, the premium for being a
male or a member from the forward caste is prominent even in 2012. The opportunity tree also
brings forth the special case of the tribal part of India, the North-Eastern region, where the
most historically disadvantageous caste categories of SC/ST have better earning opportunity
than the upper castes there, which is never the case for the rest of the country.
22
Appendices
A
Multiple imputation
A.1
The algorithm of multiple imputation of chained equation
To impute parental education and father’s occupation, we adopt a multivariate imputation
approach, in particular, the sequential regression multiple imputation algorithm of Raghunathan
et al. (2001). This algorithm draws the imputed values through a series of univariate regressions,
or equivalently, through a series of chained equations and hence, is also called the multiple
imputation of chained equations (MICE). The underlying imputation model specification takes
all the variables as predictors except the one to be imputed. First, the variables to be imputed
are ordered from the least to the highest (in terms of missing values) and then start imputing the
variable for which missing information is minimum, using predictors without any missing value.
The next ordered variable (with second least number of missing values) is then imputed using the
non-missing predictors, as well as the imputed value of the first variable. The process continues
till the variable with highest number of missing value is imputed. Further, each imputation
consists of multiple cycles or iterations to get more stable set of imputed values, based on which,
the final vector of imputed values are drawn for the entire working sample. The algorithm is
detailed in Raghunathan et al. (2001)24 . For two imputed variables, the regression sequence is
described as below.
Let X1 and X2 be the variables to be imputed with the fully specified vector of variables
denoted by Z and let X1 be the variable with the least number of missing values (which in our
case, is parental education for all rounds). In the first cycle, X1 is regressed on Z (i.e. X1 Ñ Z)
and the missing values in X1 are imputed by simulated draws from the posterior distribution
of X1 . Then X2 is regressed on Z along with the imputed values of X1 , (i.e. X2 Ñ X1m , Z)
and imputed values of X2 are drawn similarly. In the cycles thereafter, each of X1 and X2 are
regressed on the fully specified variables along with the previously imputed variables. Thus, in
the second cycle, the prediction sequence is (X1 Ñ X2m , Z), (X2 Ñ X1m , Z) and so on. The
cycles are continued (often upto 10 to 20 iterations) to converge to a set a stable imputed values
tX11, X21u, that constitutes the first imputed data set. The entire process with the same number
of iterations are then repeated M times, to produce M copies of the imputed data sets, with
imputed variables tpX11 , X21 q, . . . , pX1M , X2M qu. The non-parametric and parametric measures of
IOP are then estimated for each of these M imputed data sets and the final estimate of IOP is
then estimated as the average of all the imputed data sets [Rubin’s rule (Rubin 1986)].
Notice that, after the first cycle, all the missing values are imputed. If the missing pattern is
monotone, that is, if X2 is missing only if X1 is missing, there is no need of further iteration.
Only cycle one is repeated M times to produce multiple copies of the imputed data set. In that
case the prediction sequence is like - (X1 Ñ Z); (X2 Ñ X1m , Z). Since X2 is only missing when
X1 is missing, this sequence is enough to draw sensible imputed values for both the variables
(Raghunathan et al. 2001). When missing pattern is arbitrary, iteration is needed so as to get
a stable set of imputed values, that is repeatedly predicted by old and newly imputed values.
24
Also see Royston et al. (2011), Azur et al. (2011).
23
A.2
Imputation model and diagnostics
The variables to be imputed in our case, are - parental education and father’s occupation,
where the former is generated by combining father’s and mother’s education25 . To reduce
imputational rigor, we consider to impute the combined parental education, instead of imputing
each of the father’s and mother’s education (much in the spirit of ‘transform then impute’
(Von Hippel 2009)). We estimate an ordered logistic regression as our imputation model, to
estimate parental background with a broad range of covariates, that are not missing for the
entire work sample. Following the literature (Rubin 1986, Little 1988, Schafer 1999), we include
three broad set of covariates - (i) the analysis model variables (caste, sex, zone along with their
all possible interactions), (ii) the auxiliary variables (household size, consumption expenditure,
sector, religion, along with children age, age squared, education, occupation, sex, marital status,
relation to head) and (iii) the survey specific variables (sub round, second stage stratum, first
stage units26 ). Following Teyssier (2017), who have used MI for the same purpose of imputing
parental background for Brazil, we include the sample weight as a predictor as well (along
with the normal use of sample weights in the logit model). In addition, children wage and its
interaction with age is also considered for the wage sample imputation. The imputation model
does not have any claim of causality, but it should fit the data well. With highly significant
model chi-square statistics for all rounds, Table 6 does not indicate that our chosen imputation
model is a poor fit for any of the imputed variables.
Year
2011-12
2011-12
Likelihood Ratio Chi-square
Parental [p-value]
Father’s
[p-value]
education
occupation
Work sample
2978.2
[0.000]
4118.6
[0.000]
Wage sample
1632.9
[0.000]
1779.4
[0.000]
Pseudo R2
Parental
Father’s
education occupation
0.181
0.418
0.215
0.388
Table 6: Imputation model checka
a
We report McFadden R2 in particular.
Around 70% of our working sample have missing information on parental background that
we needed to impute. Multiple imputation is a simulation based algorithm and hence, the
power and precision of the multiply imputed values are likely to increase with the number of
imputations, especially when missing data proportion is large. So far in the literature, there
is no unequivocal rule to choose an optimum number of imputations. However, even with a
high fraction of missing information, a number of literature often recommends that a modest
number of imputation is good enough to generate statistically sound imputed values (Rubin
1986, Schafer 1999)27 . As shown by Rubin (1986), the relative efficiency of an infinite number of
25
In case of single-parent household, that constitute about 8% of the co-resident sample, parental education is
the education of the single parent.
26
NSSO adopts a complex stratifying sampling procedure, with households as the first stage units and individuals as the ultimate stage units. It further divides the survey year in four sub-rounds comprising of three months
in each. Second stage stratum is a middle level stratification made by NSSO on the basis of affluent households
to make sure that the final selected households are not restricted to any specific economic class.
27
Besides, in case of a complex imputation model with large number of variables and sample size, even a single
imputation takes hours to complete, and so more, if it is iterative. The computational effort associated with the
24
imputations subject to a finite one, is p1 γ {mq1{2 , where γ and m are the fraction of missing
information and the number of imputations, respectively28 . In case of 70% missing information
(γ 0.7), the relative large sample efficiency is already 0.96 with 10 imputations, that increases
to 0.98 for 20 imputations. Since in case of large degrees of freedom, each additional imputation
adds little to the efficiency of the estimated parameter (Schafer & Olsen 1998), we choose to do
20 imputations and each imputation is generated from a simulated draw of 20 iterations.
However, “a naive imputation is worse than doing nothing” (Little 1988, p 288). We have a
total of 20 imputed data-set. For a randomly chosen imputation, Table 7 reports the distribution
of the imputed variables in the observed data-set (‘response’), the imputed data-set (‘nonresponse’) and the completed data-set (‘response’+‘non-response’), for both of our final working
sample and the wage sub-sample. At a glance, father’s occupation seem to have been imputed
better, for it has similar distribution across all the data-sets. Whereas, more parents are pointed
as having no formal education for the imputed data-set. But that does not mean a faulty
imputation of parental education, and in fact, the difference in its distribution is indicative
of a rather sensible imputation. The non-co-resident sample, who are, on average, 10 years
older than the co-resident ones, are supposed to have older parents. Provided the substantial
educational improvement over time for all generations, as is reflected by Table 8 and 9, older
parents are more likely to be deprived of formal education, exactly as they are imputed. On
the other hand, Table 8 also shows that occupational composition of the samples does not seem
to be markedly different due to co-incidence. Provided low occupation mobility in India, this is
likely to be true for parents as well29 . Besides, as a robustness check, we found that the pattern
of the distributions of the imputed values are similar for many other imputed data sets as well.
higher number of imputations in these cases, are often too prohibitively high to make little sense to increase the
number of imputations for a marginal increase in efficiency (Allison 2003, Von Hippel 2005, Azur et al. 2011).
28
Missing information, strictly speaking, is not the same as the number of missing data points. With high
correlation between the missing variables and the observed covariates, γ is actually lesser than the percentage of
missing values (Graham et al. 2007). However, they are the same in the simplest setting.
29
Also note from Table 9, that in 2011-12, 56% of co-resident sample have their fathers working in agricultural
sector, while 45% of them are in agricultural job themselves (Table 8).
25
Survey year
Ñ
obs.
2011-12
imp. comp.
Work sample imputation
Parental education
No schooling
0.305
Below primary
0.280
Above primary
0.415
Father’s occupation
White collar
0.198
0.353
Blue collar
Agricultural
0.449
diagnostics
Wage sample imputation
Parental education
No schooling
0.326
Below primary
0.270
Above primary
0.404
Father’s occupation
White collar
0.181
Blue collar
0.473
Agricultural
0.346
diagnostics
0.379
0.263
0.358
0.354
0.269
0.378
0.194
0.408
0.398
0.195
0.393
0.412
0.379
0.247
0.374
0.363
0.254
0.383
0.124
0.489
0.387
0.138
0.485
0.377
Table 7: Imputation diagnosticsa
a
Where ‘obs.’, ‘imp’ and ‘comp.’ stand for observed, imputed and completed data set, respectively. For
reporting the imputed and the completed data set, we choose one imputation at random (among 20 imputations).
26
B
Additional tables and figures
age
hhsize
%male
%rural
%SC/ST
%married
%noschool
%agri
%wage
N
32.8
5.0
0.82
0.72
0.30
0.82
0.24
0.45
0.48
90574
35.5
4.4
0.77
0.71
0.31
0.96
0.31
0.46
0.49
59592
26.5
6.5
0.93
0.72
0.26
0.50
0.10
0.42
0.45
30982
Working sample
(total)
2011-12
Non-response part
(non-co-resident)
2011-12
Response part
(co-resident)
2011-12
Table 8: Summary statistics: working sample, response part and non-response parta
a
Response part correspond to the co-resident sample for which parental information is provided in the data-set,
whereas the non-response part are the non-co-resident samples for which parental backgrounds are needed to be
imputed. Working sample is the union of the response and the non-response part. ‘age’ and ‘hhsize’ reports the
mean age and household size of the respective sample. %male, %rural, %SC/ST, %married, %noschool, %agri
and %wage reports the share of males, rural inhabitants, SC/STs, married individuals, samples without any
formal schooling, samples engaged in agricultural jobs and samples who further have the information on wage
data, respectively. The last column (N) reports the respective sample size.
Co-resident parents
2011-12
[68]
age
father
age
mother
%noschool
father
%noschool
mother
%noschool
both
edu year
child
edu year
father
edu year
mother
%dom duty
mother
%agri
father
54.5
(0.09)
49.5
(0.09)
0.42
(0.01)
0.68
(0.01)
0.40
(0.01)
7.7
(0.05)
4.2
(0.05)
2.5
(0.03)
0.72
(0.01)
0.56
(0.01)
Table 9: Co-resident sample summary of parentsa
a
Standard errors are in parentheses and round in squared brackets. In particular, ‘noschool father/mother’
indicates fathers/mothers who are deprived of any formal schooling, whereas ‘noschool both’ means none of the
parents have any formal schooling. ‘edu yr’ abbreviates as the year of education. ‘%dom duty mother’ denotes the
share of mothers who have reported not to be in the labor market for attending domestic duties and ‘%agri father’
are the share of fathers engaged in agriculture related jobs.
27
Ref: General
OBC
SC/ST
Ref: Primary plus
Primary or below
No schooling
Ref: White collar
Blue collar
Agricultural
Ref: North
East
Central
North-East
South
West
MPCE
Wage
-0.058
(0.00)
-0.063
(0.00)
-0.136
(0.00)
-0.146
(0.00)
-0.067
(0.00)
-0.102
(0.00)
-0.274
(0.00)
-0.409
(0.00)
-0.075
(0.00)
-0.226
(0.00)
-0.094
(0.00)
-0.303
(0.00)
-0.359
(0.00)
-0.403
(0.00)
-0.179
(0.00)
-0.196
(0.00)
-0.281
(0.00)
-0.229
(0.00)
-0.267
(0.00)
-0.022
(0.39)
-0.055
(0.02)
-0.299
(0.00)
0.010
(0.38)
5.38
(0.00)
-0.316
(0.00)
3.90
(0.00)
Ref: Male
Female
Intercept
Table 10: Reduced form OLS: for MPCE and Wagea
Standard errors are in parenthesis. ( , , ) correspond to 1%, 5% and 10% level of significance, respectively.
a
28
References
Allison, P. D. (2000), ‘Multiple imputation for missing data: A cautionary tale’, Sociological
methods & research 28(3), 301–309.
Allison, P. D. (2003), ‘Missing data techniques for structural equation modeling.’, Journal of
abnormal psychology 112(4), 545.
Alon, S. (2009), ‘The evolution of class inequality in higher education: Competition, exclusion,
and adaptation’, American Sociological Review 74(5), 731–755.
Arneson, R. (1989), ‘Equality of opportunity and welfare ’, Philosophical Studies 56, 77–93.
Asadullah, M. N. & Yalonetzky, G. (2012), ‘Inequality of educational opportunity in India:
Changes over time and across states’, World Development 40(6), 1151–1163.
Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. (2011), ‘Multiple imputation by chained
equations: what is it and how does it work?’, International journal of methods in psychiatric
research 20(1), 40–49.
Bourguignon, F., Ferreira, F. H. & Menéndez, M. (2007), ‘Inequality of opportunity in Brazil ’,
Review of income and wealth 53(4), 585–618.
Brunori, P., Ferreira, F. & Peragine, V. (2013), Inequality of Opportunity, Income Inequality,
and Economic Mobility: Some International Comparisons, Paus E. (eds) Getting Development
Right; Palgrave Macmillan, New York.
Brunori, P., Hufe, P. & Mahler, D. G. (2018), ‘The roots of inequality: Estimating inequality of
opportunity from regression trees.’.
Checchi, D. & Peragine, V. (2010), ‘Inequality of opportunity in Italy ’, Journal of economic
inequality 8, 429–450.
Checchi, D., Peragine, V. & Serlenga, L. (2010), ‘Fair and unfair income inequalities in Europe’,
IZA discussion paper No. 5025 .
Cogneau, D. & Mesplè-Somps, S. (2008), ‘Inequality of opportunity for income in five countries
of Africa’, John Bishop, Buhong Zheng (ed.); Inequality and Opportunity: Papers from the
Second ECINEQ Society Meeting, Emerald Group Publishing Limited 16, 99–128.
Cohen, G. A. (1989), ‘On the currency of egalitarian justice ’, Ethics 99, 906–944.
Deaton, A. & Dreze, J. (2002), ‘Poverty and Inequality in India: A Re-Examination ’, Economic
and Political Weekly 37(36), 3729–3748.
Dev, S. M. & Ravi, C. (2007), ‘Poverty and Inequality: All-India and States, 1983-2005’, Economic and Political Weekly 42(6), 509–521.
Dworkin, R. (1981a), ‘What is equality? Part 1: Equality of resources ’, Philosophy & public
affairs 10, 283–345.
Dworkin, R. (1981b), ‘What is equality? Part 1: Equality of welfare ’, Philosophy & public
affairs 10, 185–246.
29
Ferreira, F. H. & Gignoux, J. (2011), ‘The measurement of inequality of opportunity: theory
and an application to Latin America ’, The review of income and wealth 57(4).
Ferreira, F. H. & Peragine, V. (2015), ‘Equality of Opportunity: Theory and evidence ’, Policy
research working paper (WPS 7217, Washington, D.C: World Bank Group).
Graham, J. W., Olchowski, A. E. & Gilreath, T. D. (2007), ‘How many imputations are really needed? some practical clarifications of multiple imputation theory’, Prevention science
8(3), 206–213.
Himanshu (2007), ‘Recent Trends in Poverty and Inequality: Some Preliminary Results’, Economic and Political Weekly 42(6), 497–508.
Himanshu (2018), ‘Widening Gaps: India Inequality Report 2018’, Oxfam India .
Hnatkovska, V., Lahiri, A. & Paul, S. B. (2012), ‘Caste and labor mobility ’, Applied economics
4(2).
Hnatkovska, V., Lahiri, A. & Paul, S. B. (2013), ‘Breaking the Caste Barrier: Intergenerational
Mobility in India’, Journal of human resources 48(2), 435–473.
Hothorn, T., Hornik, K. & Zeileis, A. (2006), ‘Unbiased recursive partitioning: A conditional
inference framework’, Journal of Computational and Graphical statistics 15(3), 651–674.
Jong-Sung, Y. & Khagram, S. (2005), ‘A comparative study of inequality and corruption’,
American sociological review 70(1), 136–157.
Lefranc, A., Pistolesi, N. & Trannoy, A. (2009), ‘Equality of opportunity and luck: definitions
and testable conditions, with an application to income in France (1979-2000)’, Journal of
public economics 93, 1189–1207.
Little, R. J. (1988), ‘Missing-data adjustments in large surveys’, Journal of Business & Economic
Statistics 6(3), 287–296.
Marchenko, Y. V. & Eddings, W. (2011), ‘A note on how to perform multiple-imputation diagnostics in stata’, College Station, TX: StataCorp .
Marrero, G. A. & Rodrı̀guez, J. G. (2011), ‘Inequality of opportunity in the United States:
trends and decomposition’, Research on Economic Inequality 19, 217–216.
NSSO (2008), ‘NSS Report No. 533: Migration in India: July, 2007-June, 2008’, National Sample
Survey Organization, Ministry Of Statistics and Program Implementation (MOSPI), Govt. of
India .
Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J. & Solenberger, P. (2001), ‘A multivariate
technique for multiply imputing missing values using a sequence of regression models’, Survey
methodology 27(1), 85–96.
Ramos, X. & Van de Gaer, D. (2012), ‘Empirical approaches to inequality of opportunity:
Principles, measures and evidence ’, IZA discussion paper no. 6672 .
Rawls, J. (1971), A theory of justice, Cambridge: Harvard University Press.
30
Roemer, J. (1993), ‘A Pragmatic Theory of Responsibility for the Egalitarian Planner’, Philosophy & Public Affairs 22, 146–166.
Roemer, J. (1998), Equality of Opportunity, Harvard University Press, Cambridge, MA.
Roemer, J. E. & Trannoy, A. (2013), ‘Equality of Opportunity’, Cowles foundation discussion
paper no. 1921 .
Royston, P., White, I. R. et al. (2011), ‘Multiple imputation by chained equations (mice):
implementation in stata’, J Stat Softw 45(4), 1–20.
Rubin, D. B. (1976), ‘Inference and missing data’, Biometrika 63(3), 581–592.
Rubin, D. B. (1986), ‘Basic Ideas of Multiple Imputation for Nonresponse’, Survey Methodology,
Statistics Canada 12(1), 37–47.
Salehi-Isfahani, D., Hassine, N. B. & Assaad, R. (2014), ‘Equality of opportunity in educational achievement in the middle east and north africa’, The Journal of Economic Inequality
12(4), 489–515.
Schafer, J. L. (1999), ‘Multiple imputation: a primer’, Statistical methods in medical research
8(1), 3–15.
Schafer, J. L. & Olsen, M. K. (1998), ‘Multiple imputation for multivariate missing-data problems: A data analyst’s perspective’, Multivariate behavioral research 33(4), 545–571.
Singh, A. (2012), ‘Inequality of opportunity in earnings and consumption expenditure: The case
of Indian men’, The review of income and wealth 58(1), 79–106.
Teyssier, G. (2017), ‘Inequality of opportunity: New measurement methodology and impact on
growth’, Seventh ECINEQ Meeting, New-York City (mimeo) .
Von Hippel, P. T. (2005), ‘Teacher’s corner: How many imputations are needed? a comment on
hershberger and fisher (2003)’, Structural Equation Modeling 12(2), 334–335.
Von Hippel, P. T. (2009), ‘8. how to impute interactions, squares, and other transformed variables’, Sociological methodology 39(1), 265–291.
31
LATEST TITLES IN THE CSH-IFP WORKING PAPERS
Note: The USR3330 Working Papers Series has been renamed as CSH-IFP Working Papers in
2018. However the numbering continues uninterrupted.
Exploring Urban Economic Resilience : The Case of A Leather Industral Cluster in Tamil Nadu. - Kamala
Marius, G. Venkatasubramanian, 2017 (WP no. 9)
https://hal.archives-ouvertes.fr/hal-01547653
Contribution To A Public Good Under Subjective Uncertainty. - Anwesha Banerjee, Nicolas Gravel, 2019
(WP no. 10)
https://halshs.archives-ouvertes.fr/halshs-01734745
Vertical governance and corruption in urban India: The spatial segmentation of public food distribution - Frédéric Landy with the collaboration of Thomas François, Donatienne Ruby, Peeyush
Sekhsaria, 2018 (WP no. 11)
https://hal.archives-ouvertes.fr/hal-01830636
Is the preference of the majority representative? - Mihir Bhattacharya and Nicolas Gravel, 2019
(WP no. 12)
https://hal.archives-ouvertes.fr/hal-02281251
Evaluating Education Systems - Nicolas Gravel, Edward Levavasseur and Patrick Moyes, 2019
(WP no. 13)
https://hal.archives-ouvertes.fr/hal-02291128
Institut Français de Pondichéry
Pondicherry
Centre de Sciences Humaines
New Delhi