Academia.eduAcademia.edu

Inequality of Opportunity in Indian Society∗

2019

Using data from the National Sample Survey we estimate inequality of opportunity for India, in consumption expenditure, wage earning and education, on the basis of caste, sex, region and parental backgrounds as our circumstances. We use the widely used methods of non-parametric and parametric analysis to find that even in 2011-12, more than one-fourth of total inequality in wage and education is due to unequal circumstances. But as compared to the other two outcomes, we find inequality of opportunity in consumption to be relatively low. We further provide the opportunity tree for India using the recently introduced method of the regression tree analysis and find parental backgrounds as the most important circumstances for all outcomes. The opportunity tree also reveals a hierarchical order among the circumstances that are most relevant for the underlying unequal opportunity in the country. JEL Classification : D31, D63, I24

CSH-IFP Working Papers USR 3330 “Savoirs et Mondes Indiens” Inequality of Opportunity in Indian Society Arnaud Lefranc and Tista Kundu Institut Français de Pondichéry Pondicherry Centre de Sciences Humaines New Delhi 14 The Institut Français de Pondichéry and the Centre de Sciences Humaines, New Delhi together form the research unit USR 3330 “Savoirs et Mondes Indiens” of the CNRS. Institut Français de Pondichéry (French Institute of Pondicherry): Created in 1955 under the terms agreed to in the Treaty of Cession between the Indian and French governments, the IFP (UMIFRE 21 CNRS- MAE) is a research centre under the joint authority of the French Ministry of Foreign Affairs (MAE) and the French National Centre for Scientific Research (CNRS). It fulfills its mission of research, expertise and training in human and social sciences and ecology, in South and South-East Asia. Major research works focus on Indian cultural knowledge and heritage (Sanskrit language and literature, history of religions, Tamil studies etc.), contemporary social dynamics (in the areas of health, economics and environment) and the natural ecosystems of South India (sustainable management of biodiversity). Institut Français de Pondichéry, 11, Saint Louis Street, P.B. 33, Pondicherry-605 001, India Tel: (91 413) 2231609, E-mail: [email protected] Website: http://www.ifpindia.org/  Centre de Sciences Humaines (Centre for Social Sciences and Humanities): Created in 1990, the CSH (UMIFRE 20 CNRS- MAE) is a research centre jointly managed by the French Ministry of Foreign Affairs (MAE) and the French National Centre for Scientific Research (CNRS). Conveniently located in the heart of New Delhi, the Centre produces research in all fields of social sciences and humanities on issues of importance for India and South Asia. The main themes studied by CSH researchers include territorial and urban dynamics, politics and social changes, economic growth and inequalities, globalization, migration and health. Centre de Sciences Humaines, 2, Dr. Abdul Kalam Road, New Delhi-110 011, India Tel: (91 11) 3041 0070, E-mail: [email protected] Website: http://www.csh-delhi.com/ © Institut Français de Pondichéry, 2020 © Centre de Sciences Humaines, 2020 CSH-IFP Working Papers - 14 Inequality of Opportunity in Indian Society Arnaud Lefranc and Tista Kundu 2020 Institut Français de Pondichéry Centre de Sciences Humaines Inequality of Opportunity in Indian Society∗ Arnaud Lefranc†and Tista Kundu‡ March 4, 2020 Submitted for considering as a CSH-IFP working paper. Abstract Recent debates on distributive justice have started to prioritize inequality of opportunities, that is exclusively generated from circumstance factors beyond individual control. Using data from the National Sample Survey we estimate inequality of opportunity for India in consumption expenditure and wage earning, on the basis of caste, sex, region and parental backgrounds as our circumstances. Adopting the widely used methods of non-parametric and parametric analysis, we find that even in 2011-12, more than one-third of the total wage inequality can be attributed to the differences in the ascribed social positions of an individual. Inequality of opportunity in consumption on the other hand is relatively low. Furthermore, we used the regression tree algorithm to find the hierarchical order among the circumstances and construct the opportunity tree for India, that the previous methods are unable to provide. In the fashion of machine learning, the opportunity tree identifies parental background as one of the most important circumstance factor behind the underlying unequal opportunity in the country, for either outcomes. But the effect of casteism is prominent as well, that in interaction with region, affirms a forward caste premium for most part of the country, particularly for the regular salaried wage earners. JEL Classification Keywords : D31, J71, O12 : Caste, Inequality of opportunity, Mean Log Deviation, Multiple imputation, Parental background, Regression tree ∗ The author thanks Daniel Mahler and Geoffrey Teyssier for their technical help as well as suggestive comments. We are grateful to Cristina Terra, Marta Mènendez, Nicolas Gravel for their helpful feedback. Comments from seminar participants at ESSEC Business School (Paris), Laboratoire THEMA (University of Cergy-Pontoise), Winter School on Inequality and Social Welfare Theory (IT13, University of Verona), University of Calcutta, Centre for Studies in Social Science Calcutta, Indian Statistical Institute (Kolkata), 15th Growth and Development conference (ISI, Delhi), Centre de Sciences Humaines (New Delhi), are also gratefully acknowledged. † Université de Cergy-Pontoise, France ‡ Corresponding author. Centre de Sciences Humaines, New Delhi, India. Contact: [email protected] 1 1 Introduction “..The service to India means the service of the millions who suffer. It means the ending of poverty and ignorance and decease and inequality of opportunity.” - Jawaharlal Nehru1 Seventy years have passed after this speech is made at the stroke of midnight on the very first day of independence of India. Over this span, India from an impoverished country, made her journey to one of the emerging global economy now. Especially since the late nineties, with a consistent high GDP growth rate of more than 7%, India has now become the sixth largest economy in the world. Much work has been accomplished with significant improvement in overall well-being of the country, but much enough, if not more, remains to be done or even addressed. Numerous studies have showed that the rapid growth of India has been accompanied by increasing inequality as well. However very few studies have yet been done to explore how much of the growing inequality is due to inequality of opportunity, that is how much of this high inequality is generated by factors that are purely fatalistic and therefore beyond any human control. India followed an interventionist central planning for the first forty years after independence followed by ‘neo-liberal’ economic reforms at the beginning of 1990s. Since then, both the overall growth rate and inequality in India grew almost simultaneously, making it a very relevant and active area of research concerning India. A sharp increase in consumption inequality along with a slower pace of poverty reduction has almost become a distinct feature of the Indian economy, especially in the twenty-first century2 . But for a very stratified society like India, while there are wealth of literature on analyzing the problem of inequality, linking it to social mobility, labor market discrimination, urbanization or poverty, only a handful of them analyze how much of this inequality is due to unequal opportunities arising from varying social and family backgrounds, for which no one can be held accounted for. The present work aspires to quantify the degree of unequal opportunity in India by estimating how much of inequality in consumption and wage is due to differences in caste, sex, region, parental education and occupation. Traditionally inequality had been assessed following a welfarist approach, where inequality in the final outcome was the main focus of analysis. Unequal distribution of any desirable outcome (e.g. income, education, standard of living, health etc.) are of primary concern for assessing social welfare. However inequality can arise from an array of different factors, some of which are purely fatalistic to the individuals. This heterogeneity in the inequality generating factors had actually triggered a philosophical debate in the late twentieth century, criticizing the fact that the classical welfarist way of inequality analysis is an approach too consequentialist to take into account the multifaceted nature of the inequality generating process (Rawls 1971, Dworkin 1981b,a). The main point of the debate is that inequality arising from factors on which no individual has any control, like race, sex, ethnicity, religion, birthplace, parental and family background, should be of primary concern from an ethical standpoint and should therefore be considered as rather unfair. On the other hand inequality generated from 1 Excerpt from ‘Tryst with Destiny’ - a speech delivered on the first day of independence, 15th August 1947, by Jawaharlal Nehru, the first Prime Minister of independent India. 2 See Deaton & Dreze (2002), Himanshu (2007), Dev & Ravi (2007), for example. For the recent updates on Indian inequality, see India inequality report by Himanshu (2018). 1 unregulated lifestyle, lack of perseverance, inadequate skill formation or poor managing ability, in other words, factors for which one can arguably be held responsible for, are not unethical and unfair in an egalitarian society. This new approach of analyzing inequality by splitting it into fair and unfair part, brings about the question of individual responsibility in the domain of distributive justice and started to prioritize the analysis of inequality arising solely from the factors that are beyond subjective responsibility (Arneson 1989, Cohen 1989). Inspired by this philosophical debate on the responsibility sensitive egalitarian justice, Roemer (1993) formulates inequality of opportunity as that part of inequality that is generated by factors beyond any individual control. In the jargon of inequality of opportunity (IOP), all such factors that are outside the periphery of individual responsibility but are responsible for generating inequality, are called circumstances. On the other hand the inequality generating factors that the individual can presumably control, are called efforts. In this dichotomous standpoint of effort versus circumstances, inequality of opportunity is that (unfair) part of inequality that had been generated only by the circumstance factors (Roemer 1998). Methodologically both non-parametric and parametric approaches serve the literature to estimate the measure of IOP in a society. The backbone structure of these methods attributes to Checchi & Peragine (2010) (for Italy) and Bourguignon, Ferreira & Menéndez (2007) (for Brazil), respectively for the non-parametric and the parametric estimates. Although the parametric estimates of IOP comes at the cost of a specific functional form assumption between the outcome and the circumstance variables, it is often recommended for studies with a broad range of circumstances. Whereas the use of non-parametric approach is more common for multi-country comparison studies that is limited to a comparable set of circumstances across the countries. So far in the literature there is no universal consensus to prioritize one approach over another. But in either set up to quantify the unfair part of inequality as a measure of IOP, majority of the literature use an index from the generalized entropy class of inequality indices, that of the index of mean log deviation. Using a slightly different non-parametric and parametric set up, Ferreira & Gignoux (2011) nevertheless showed that the estimates of IOP are significantly close regardless of the method adopted. This is the methodological set up that we will use for measuring the index of IOP in India3 . Two of the major shortcomings of the above mentioned approaches are that they are based on a pre-specified number of circumstances and often uses all possible interactions of the chosen circumstances to estimate IOP, while in reality only some of the interactions may be most contributive ones. However there is no way that either of the non-parametric or the parametric set-up can point out the relevant interactions. Besides including all possible interactions also increases the total number of circumstance groups to compare, which may lead to an overestimated IOP as the number of observations per cell decreases. To address this problem, Brunori, Hufe & 3 See Roemer & Trannoy (2013), Ramos & Van de Gaer (2012) for an extensive analysis on the major methodologies used in the literature. For some international estimates of IOP, see Brunori, Ferreira & Peragine (2013) (selected developed countries including some Nordic countries, selected Latin American, African, Middle-East and Asian countries), Ferreira & Gignoux (2011) (Latin American countries), Marrero & Rodrı̀guez (2011) (United States of America), Checchi, Peragine & Serlenga (2010) (European countries), Cogneau & Mesplè-Somps (2008) (African countries). 2 Mahler (2018) introduced a novel approach of analyzing IOP using the regression tree analysis that let the algorithm choose the most relevant circumstances in a statistically significant way from the submitted set of circumstances and generates a visually interpretative opportunity tree in the hierarchical order of circumstances. Therefore along with the non-parametric and parametric estimation of IOP, we adopt this approach for the present work as well to provide the opportunity structure for contemporary India. India epitomizes a very hierarchical social structure historically, where the century old caste system is functional even in the twenty-first century. For such a stratified country there is almost no work analyzing unequal opportunity in India, with two notable exceptions. Using the National Sample Survey data, Asadullah & Yalonetzky (2012) analyzed educational opportunity in different states of India due to differences in sex, religion and caste. However for being a state-level study it is naturally focused more on the inter-state differences in terms of unequal opportunity in education rather than the national estimate. Besides due to the structure of the data base, their study can not take into account parental background as one of their circumstances which is repeatedly shown as one of the major driving factor behind unequal opportunities in a number of developed and developing countries. With a different survey Singh (2012) gives a national estimate of IOP in India for consumption and income, that includes father’s educational and occupational background as two of the major circumstances. But due to the survey structure, the inclusion of parental background limits this study to Indian men only. Besides none of the above studies gives the recent picture of India, as the latest time frame in either work is 2004-05. The scanty work on IOP in India leaves significant scope of further improvement. The aim of the present paper is to provide the latest estimates of IOP in India using both the non-parametric and parametric methodology, as well as to provide the opportunity structure for contemporary India adopting the recently introduced approach of the regression tree analysis. In particular we choose two outcome variables to analyze, namely, consumption expenditure and wage earning, and analyze IOP for a set of five circumstance variables comprising of caste, sex, region, parental education and occupation. The present work contributes to the literature in several ways. First, using the latest employment unemployment survey database of the National Sample Survey Organization, our study gives a rather recent picture of unequal opportunity in India during 2011-12. We found that even by 2012, more than one-third of the total wage inequality is due to differences in the taken circumstances. This positions India as one of the high opportunity unequal countries in the global perspective. Second, due to the structure of the National Sample Survey it is difficult to incorporate parental information into the analysis, as the survey questionnaire have no direct provision of reporting this information. Instead parental attributes are only available for the co-resident households where parents are enumerated along with their offspring. This immediately raises the question of selection bias due to co-residence. The present study overcome this problem by imputing information on parental background for the general sample by the widely used technique of multiple imputation (Rubin 1986). We thereby produce the estimates of IOP by taking into account the important circumstances of parental backgrounds but without limiting the study to the co-resident households. In fact we found that ignoring parental backgrounds as circumstances, results in considerable underestima3 tion of IOP, as the loss in information due to omitting parental attributes can not be captured well by the other social circumstantial backgrounds considered, like caste, sex or region. Besides, in spite of the prevalent evidence of casteism in India, differences only on the basis of caste groups is found to be not enough to capture the differences in economic opportunity arising from other sources like family backgrounds. The opportunity tree reconfirms the paramount importance of parental education and occupation. Both consumption and earning opportunity are better for someone having at least one parent with above primary level schooling experience and father engaged in non-agricultural occupation. This is the third contribution of our paper, that is to show how our circumstances are intertwined in generating unequal opportunity in the society. Rest of the paper is organized as follows. Section 2 sketch out the methodological framework of the non-parametric, parametric and the regression tree approach. Section 3 introduces our data and a clear clarification of all our variables, along with details on our sample selection criteria. Section 4 describes our results in different subsections. After discussing the main non-parametric and parametric measures of IOP in India, we give a brief account on the relative importance of caste and other social backgrounds, in comparison with the parental backgrounds. The opportunity structure for contemporary India is discussed next, separately for the two outcome variables. Section 5 concludes. 2 Theoretical and methodological background In the analysis of inequality of opportunity, any social outcome is supposed to be generated by two broad classes of factors. Factors that are beyond individual responsibility or circumstances (C) and factors that are within individual control or efforts (e). Therefore, borrowing from Ferreira & Peragine (2015), the simplified outcome generating process can be written as y  f pC, eq (1) Such that the outcome to be analyzed, y, can be determined from a finite set of circumstances, C, and efforts, e. From the standpoint of responsibility sensitive egalitarianism any outcome inequality generated by C is ethically objectionable, whereas inequality arising from e can be considered legitimate4 . Any analysis of IOP therefore begins with the clear classification of the circumstance and the effort variables. However there are no fixed list of circumstance or effort variables to be taken into account, as they are subject to data availability and are rather determined in the social or political space that varies between different societies (Roemer & Trannoy 2013). Nevertheless as common to any empirical exercise, estimates of IOP crucially depends on the data structure and partial observabiltiy of circumstance or effort factors severely limit the study. Data availability on effort factors in particular, are even more limited for a large number of surveys. However IOP is the amount of inequality generated by circumstances only and efforts, the so called legitimate source of inequality, can itself be determined by the existing social circumstances. Hence effort variables themselves are often assumed to be a function of circumstances, so that the outcome 4 Lefranc et al. (2009) introduced a third factor, that of luck, in the study of IOP, which we did not consider in the present work. 4 generating process in equation (1) can actually be reformulated as a reduced form equation, y  g pC q, where outcome is a function of circumstances only (Ferreira & Gignoux 2011). Of course higher the number of circumstances taken into account, more realistic is the measure of IOP. But with addition of new circumstances into the analysis IOP will always increase as long as the added circumstances are not orthogonal to the outcome in concern. Since it is impossible for any survey to provide a complete exhaustive list of circumstances, Ferreira & Gignoux (2011) therefore advice to interpret any resulting estimates of IOP as the lower bound of the true IOP in the society. Unlike the traditional inequality approach, social welfare in the responsibility sensitive domain is not judged on the basis of total inequality in the outcome variable, I ty u. Rather IOP is the measure of only that part of outcome inequality that is generated by the circumstance factors, C, exclusively. So the main methodological challenge for quantifying IOP is to quarantine this unfair part of outcome inequality. This is usually done in the literature by constructing suitable counterfactual distributions, y CF , such that by construction, y CF is able capture the variability in the outcome arising uniquely from the differences in the circumstance variables, C. The measure of absolute IOP in the society can then be measured by the inequality in the counterfactual distribution, I ty CF u. However since IOP is estimated as that part of total inequality which is unfair and morally objectionable, it is a common practice in the literature to provide the estimates of relative IOP as the share of unfair inequality in the total outcome inequality by I ty CF u{I ty u. The construction of the counterfactual distributions and hence the measurement of IOP, varies with the non-parametric or the parametric statistical model of analysis as discussed below. 2.1 Non-parametric approach The non-parametric method for the present analysis have been adopted from the work of Ferreira & Gignoux (2011). Consider a finite population set, i P t1, ..., N u, characterized by tyi , Ci u, standing for outcome and circumstances respectively. Assume that the vector Ci consists of J elements and each of the element can take xj number of values or categories. Usually groups formed by all possible interactions of the circumstances are called types. In this framework, the population under study can thus be partitioned into a maximum of K̄  ¹ x , exhaustive and J  j j 1 mutually exclusive types. From the viewpoint of IOP any inequality between types is ‘unfair’. To isolate this unfair inequality each of the k types are represented by a ‘smoothed distribution’ of their respective mean outcomes. Thus every individual in a type, i P tk, k  1, . . . , K̄ u, are assumed to be characterized by the type-mean outcome, µk , for each k  1, . . . , K̄. Therefore the counterfactual distribution to quarantine the inequality generated exclusively from the differences in types, is represented by, y CF  tµ1 , . . . , µK̄ u. The absolute and relative measure of IOP can then be estimated as5 θaN P  I ptµk uq (2) 5 NP stands for Non-parametric, r for relative measure and a for absolute measure 5 θrN P k  IIptptµy uquq (3) i Where I ptxuq denotes inequality in the distribution of x. Following the extant literature, I pq is measured by the index of Mean Log Deviation (MLD)6 . 2.2 Parametric approach The parametric approach in the present work has been adopted from Ferreira & Gignoux (2011) as well, which also essentially estimates IOP by the mean outcome conditional on types by the OLS estimates, but differs from the non-parametric set up in its construction of the counterfactual distribution to isolate the ethically unfair part of inequality. The parametric set up usually assumes a log-linear relationship between the outcome and the circumstance/effort variables. So the income generating process can be written as ln yi  αCi βei ui (4) However, as mentioned before, the effort factors can fairly be assumed as a function of circumstances as below ei  γCi vi (5) with ui and vi being the random errors. Hence, from the structural model (4) and (5), the reduced form income generating process can be summarized as ln yi  αCi  pα  ΨCi β pγCi βγ qCi vi q pβvi ui ui q εi (6) From the OLS estimates of equation (6), Ψ̂, IOP is then measured in comparison to a hypothesized distribution, ty˜i u, that eliminates any differences in individual circumstances, as ỹi  exprC̄iΨ̂ εˆi s (7) where, C̄i is the mean of circumstance variables across the population. Thus equation (7) eliminates the differences in circumstances by replacing them with their mean values and the associated inequality, I pty˜i uq, is therefore segregated as fair, by construction. The measure of absolute IOP can then eventually be estimated from the counterfactual distribution, y CF  ptyiu  ty˜iuq, that isolates the outcome variations generated from the differences in individual circumstances only. So the relative share of IOP in the total inequality is given by7 θrP 6 7 °  I ptyiIuqpty Iuqpty˜iuq i x̄ MLD(x)= N1 N 1 ln x Where P and r in the superscript and subscript stand for parametric and relative measure respectively. 6 (8) Similar to the non-parametric approach, we use the same index of MLD for the parametric estimates of IOP as well. 2.3 Regression tree approach Circumstances by definition are all possible factors that are beyond individual responsibility and it is physically impossible for any data set to capture all such factors under a single or multiple survey. Research on IOP is therefore always restricted to a subset of the total set of circumstances. But as long as the omitted circumstances have non-trivial effect in predicting the outcome variable, addition of each such circumstance will increase the estimate of IOP by virtue of finer partitioning of the population. Clearly higher the circumstances taken into account, more realistic is the estimate of IOP. However addition of new circumstances also comes at a cost. Since this finer sample partitioning leaves fewer observations for each type, there is a chance of overestimating IOP. The regression tree analysis coined in the literature by Brunori et al. (2018), makes an attempt to allay this issue in the fashion of machine learning. Once again assume that for individual i, the circumstance vector, Ci consists of J elements, Ci P tCi1 , . . . , CiJ u, each of which can take xj number of values, where j P t1, . . . , J u. Unrestricted partitioning will then divide the population into, K̄  Jj1 xj , number of types, considering all possible interactions among the circumstances. However for a large number of Ci and/or xj variables, observations in all or some of the cells in K̄ may get too crunched to allow the researcher to use all available types, especially when sample size is relatively less. Besides in case of unrealistic vacuous interactions, some cells may suffer from no observations at all. Since there is no way to point out the relevant interactions in either of the non-parametric or the parametric modeling, the conventional resort is either to regroup the circumstances in broader categories (less xj ) or sacrificing some of the circumstances (less Ci ) or both. In the regression tree approach instead, the researcher submits the full set of available circumstances, Ci , to the program and let the algorithm choose the relevant partitioning of the sample under study in a non-arbitrary way, by recursive binary splitting to be precise. ± The recursive binary splitting is a type of permutation test, because it rearranges the labels on the observed data set multiple times and computes test statistic (and p-value) for each of this rearrangement. It starts by dividing the full sample into two distinct groups based on one circumstance factor and then continue the same for each split, potentially based on another circumstance, into more subgroups and so on. The criteria for the selection of splitting circumstances depends on the type of regression tree used. Brunori et al. (2018) uses the conditional inference tree algorithm to determine the splitting criteria as follows. The algorithm runs in two stages as • Stage - I: Selecting the initial splitting circumstance – It starts with the simultaneous testing of the J partial hypothesis, H0C : DpY |C j q  DpY q for j P t1, . . . , J u. Notice, this precisely is the testing of the existence of IOP, to see if any circumstances have any effect on the outcome. j 7 j – Adjusted p-values, pC adj , are then computed with the standard adjustment for multiple hypothesis testing8 and identifies the circumstance, C  , with the highest degree of association, that is, the circumstance with the minimum p-value, C   tC j : j 9 argmin pC adj u . – The algorithm stops if the p-value associated to C  is greater than some pre-specified  significance level, α10 . Hence, if pC adj ¡ α, the null of equality of opportunity for the society, can not be rejected at α% level of significance. Otherwise, the circumstance, C  , is selected as the initial splitting variable. • Stage - II: Growing the opportunity tree – Once C  is selected, it is split by the binary split criterion to grow the tree. For each possible binary partition, s, involving C  , the entire sample can be split into two distinct parts as, Ys  tYi : Ci xj u and Ys  tYi : Ci ¥ xj u. – For each binary split, s, the goodness of split is tested by testing the discrepancy between Ys and Ys 11 . The split, s , with the maximum discrepancy, that is with the minimum p-value, is then selected as the optimum binary split point, based on which the sample is now partitioned into two sub-samples, constructing the initial two branch of the opportunity tree. – The entire algorithm is then repeated for each branch separately, to construct the full opportunity tree. 3 Data, variables and sample selection 3.1 Data For the present analysis of inequality of opportunity in India we have taken data from the National Sample Survey (NSS). This is the biggest nationally representative micro level database for India, collected by the National Sample Survey Organization (NSSO), India. Among the many national level surveys conducted by NSSO we have taken the Employment Unemployment Survey in particular. This survey is conducted for a year in every five years, covering the whole country except some remote inaccessible area12 . For focusing on the recent scenario in India, we have taken the latest employment-unemployment survey round of NSS conducted in the year 2011-1213 . These round surveys 100000 households enumerating about 0.4 million individuals. India is predominantly rural even to date with a rural-urban ratio around 70 : 30 on average. Initially C The adjustment is the Bonferroni correction, pC q (Brunori et al. 2018, p. 8). adj  1  p1  p To test the association between the outcome variable and the covariates, the linear statistics form, along with its mean and variance, is provided in Hothorn, Hornik & Zeileis (2006), where from the relevant test statistic and p-value can be formulated. 10 Like Brunori et al. (2018) we also choose α  0.01 11 This is tested by the two sample test statistics, provided in Hothorn et al. (2006). The entire algorithm can be executed by an R package, developed by the same authors. 12 So conflict areas of Ladakh & Kargil districts of Jammu & Kashmir, some remote interior villages of Nagaland, few unreachable areas of Andaman & Nicobar Islands and those villages recorded as uninhabited by respective population census, are kept out of these surveys. 13 This means we have taken Schedule 10 survey of NSS, for round 68 (2011-12). 8 j j 9 8 we have to drop about 1000 observations to clean for valid age, sex, sector, caste specification, marital status and some other criterion. NSS provides details on several household and individual characteristics. Some of the major household provisions include household size, religion, caste and consumption expenditure, whereas age, sex, education, occupation and many other demographic characteristics are recorded for each member of the household. However not everybody reported as ‘employed’ do have information on their income, rather wage earning is selectively reported in the NSS data only for the regular and the casual wage earners who are not self-employed. Another possible drawback in the structure of NSS data base is that it does not report information on parental background directly for every individual. Rather this crucial information is only available for households where the offspring is enumerated along with his/her parents. 3.2 3.2.1 Definition of variables Circumstance variables For the present analysis, we have chosen a set of five circumstance factors, that of caste, sex, region of residence, parental education and father’s occupation. We can label the first three of them as social backgrounds, while parental education and occupation constitute parental backgrounds. With all possible interactions of these five circumstance variables, we have a total of 324 types. Caste system in India is a century old hierarchical social structure based on occupation. However the historical occupational perspective in its way became hereditary over time and children always inherit the caste of their father that is unchangeable for lifetime. There are thousands of castes in the country, which are regrouped in fewer caste categories by the constitution of India for the purpose of caste based affirmative policy or reservations. We consider three caste categories in our analysis. The lower caste category consists of the Scheduled Castes and the Scheduled Tribes caste categories together (SC/ST ). They are the most historically disadvantageous caste groups in India and are designated the reservation status since 1950. Around mid-eighties, the socially and economically backward castes among the non-SC/ST s are further categorized as the Other Backward Classes (OBC ) who are entitled to certain reservation quotas in higher education and Government jobs since the beginning of nineties. Indian nationals do not belong to any of the above mentioned caste categories are formally called as the General category individuals and are excluded from any caste based affirmative policies by rule. OBC s can be thought of as the middle level caste category who are usually little more advantageous than the historically disadvantageous caste categories of SC/ST, but have lesser economic advantage as compared to the forward General caste category. Considering the bulk of literature on gender discrimination in India, we take two categories of sex, male and female, as our next social circumstance. To consider region as one of our circumstances, we have to take region of residence, although the ideal circumstance factor would be the birth region. Due to unavailability of information on birth place, we have to consider the present residing region as a proxy for birth region, which is not a far fetched assumption given the low rate of inter-state and inter-district migration in India as per the recent migration survey report of NSSO (2008). To further minimize migration related contamination, we take 9 six broad regional categories for our analysis as - North, East, Central, North-East, South and West14 . Our next batch of circumstances consists of parental background that includes two kind of parental attributes, that of parental education and occupation. By combining father’s and mother’s education, we take three categories of parental education as - (i) both parents have no formal schooling (ii) at least one parent has primary or below primary schooling (that means the other parent, either have the same level of schooling or less) and (iii) at least one parent has above primary schooling. It is worth a mention, that ‘no formal schooling’ is not equivalent to illiterate parents, as they may have exposed to other informal adult literacy programs, but have never experienced formal schooling. Due to considerably low information on mother’s occupation, we took three categories of father’s occupation as a proxy of parental occupation. The occupational categories are taken as father’s employment in - (i) white collar job (ii) blue collar job and (iii) agricultural occupation. White collar job category includes all sorts of professional, executive and managerial jobs. Whereas, sales and service workers falls in the domain of blue collar workers. Agricultural job includes horticulture, fishing and hunter-gatherers as well. 3.2.2 Outcome variables The analysis of IOP on India is executed for two different outcome variables - consumption and wage. Both are continuous variables and expressed in Indian Rupee (INR). Consumption is considered as the monthly per capita consumption expenditure (MPCE). This is the total monthly expenditure on certain durable and non-durable goods incurred by the household over the last thirty days prior to the date of the survey. This data is therefore reported at the household level, which we divide by the respective household size to get the individual level values. The list of goods, expenditure on which is to be reported is a selection of goods that has been considered as the most important ones by the respective survey. Borrowing from Hnatkovska, Lahiri & Paul (2012), we use the real MPCE as our outcome variable, upon dividing MPCE by the state level absolute poverty lines15 . Our second outcome variable is the wage earning, which is reported only for the regular and casual wage earners. Therefore the wage data is not available for a large chunk of self-employed individuals who constitute nearly 40% of the working adults in India. Unlike MPCE, wage is reported as the weekly wage received or receivable for multiple activities, by each regular/casual earning members of the household over the last week prior to survey. The main reason for reporting wage as an weekly input is that many of the Government or non-Government public work programs in India are transitory in nature that employ a huge number of rural casual laborers. However we consider wage corresponding to the major activity that had been pursued for the maximum number of days over the reference week. In case of equal number of days 14 Statewise composition: Jammu & Kashmir, Himachal Pradesh, Punjab, Haryana and Uttarakhand - North; Bihar, Jharkhand, Orissa, West Bengal - East; Uttar Pradesh, Rajasthan, Madhya Pradesh, Chattisgarh - Central ; Sikkim, Arunachal Pradesh, Assam, Nagaland, Meghalaya, Manipur, Mizoram, Tripura - North-East; Karnataka, Andhra Pradesh, Tamilnadu, Pondichery, Kerala, Lakshadeep - South and Gujrat, Daman & Diu, Dadra & Nagar Haveli, Maharashtra, Goa - West. 15 We use poverty lines, that can account for the differences in standard of living across the states of India. Besides, the measure of absolute poverty line is provided by the Planning Commission of India using data collected by the same survey, that of the National Sample Survey, the one we use for the present analysis. 10 spent on more than one activity, we prioritize those having valid wage entry and occupation information. In particular we consider the daily real wage earning as our outcome variable by dividing total weekly wage by the number of days engaged in that major activity. Similar to MPCE, the corresponding real wages are generated upon division by the state level absolute poverty lines. 3.3 Sample selection As mentioned before, NSS does not provide information on parental attributes for every individual, making this data limited to the co-resident households that consists of both offspring and parents as the respondents. Provided the instrumental role of parental backgrounds in the analysis of unequal opportunities for a number of countries, the study on India will remain incomplete had we not consider that. Therefore given the data structure, the biggest challenge in the sample selection procedure is how to best incorporate the valuable information of parental backgrounds in our analysis of IOP in India. Studies for which parental information may be important, like the analysis of inter-generational mobility or inequality of opportunity, when using the NSS data base, usually deal with this issue either by restricting their analysis to the co-resident households (e.g. Hnatkovska, Lahiri & Paul (2013) for inter-generational mobility analysis) or by sacrificing the parental background data (e.g. Asadullah & Yalonetzky (2012) for educational opportunity analysis). As mentioned before we already ruled out the second option considering the importance of parental attributes in IOP. However to analyze IOP we want our sample to be restricted to working adults who have reportedly finished their education. Provided that, the other option to include parental attributes is to limit our analysis to households with adult inter-generational co-residence. Although adult parent-child co-residence is not an uncommon social pattern in India, it may raise the issue of selectivity bias. So to provide estimates of IOP in India with a nationally representative sample, we impute the parental attributes for our sample using the technique of multiple imputation. Our sample therefore consists of working adults who are aged between 18 to 45 years, are not currently enrolled in any educational institution, are from male-headed households (who also are the only head of the household) and have valid information on education and occupation, both for themselves and for their parents16 . However for estimating IOP in wage, we further restrict our sample to those who additionally provide valid data on wage. The theory of multiple imputation was introduced by Rubin (1976, 1986) for dealing with the problem of missing data due to non-response in large survey data sets. Although mostly popular in the statistical and medical research, the use of multiple imputation to handle missing values is expanding in economics as well, especially in the survey data based econometric analysis17 . In particular, Teyssier (2017) showed the efficacy of multiple imputation for imputing parental information for a data set on Brazil, for which this information is also available without the co-residence issue. We want to impute two parental attributes in particular, that of parental 16 We exclude multi-headed and female headed households in India, as they are rare and subject to special constraints. Over 90% of heads are male and 99% households are single-headed-household. 17 For application of multiple imputation technique in poverty and inequality analysis, see Alon (2009), JongSung & Khagram (2005), for example. Whereas, Salehi-Isfahani, Hassine & Assaad (2014), Teyssier (2017), provide estimates of IOP using multiply imputed circumstances. 11 education and father’s occupation, both of which are considered as categorical variables in our estimation of IOP. We first form our sample as per the sample selection criteria mentioned above, except the criteria related to parental attributes. We can now think of this sample as the union of two exhaustive and mutually exclusive parts - the ‘response’ and the ‘non-response’ part. While the ‘response’ part have valid information on parental background, this crucial information is missing for the other part. The exercise of multiple imputation is to use information from the ‘response’ part to impute values for the ‘non-response’ part, using all possible auxiliary information provided by the data set that are non-missing for both of the ‘response’ and the ‘non-response’ part. In our case the ‘response’ part consists of the co-resident data points for which parental background is observed18 . Table 8 in Appendix A reports the summary statistics of the ‘response’ and the ‘non-response’ sub-samples. It shows that co-residence does not seem to make a marked difference in terms of caste, occupation and rural-urban composition. But notice that the samples of the ‘response’ part, as expected, are relatively younger. Hence is the justification of taking relatively younger adults (18-45 years) for our analysis, so as to keep parity between the ‘response’ and the ‘non-response’ part. The two parental variables in concern, that of the parental education and father’s occupation are then estimated for the ‘response’ part by a suitable imputation model (an ordered logistic regression, in our case), using a broad range of predictors including households, individuals and some survey related characteristic variables that are strictly non-missing for both the ‘response’ and the ‘non-response’ part19 . Parental attributes for the ‘non-response’ part is then imputed from simulated draws of the posterior distribution of these estimates. However as the name suggests, the imputation of the missing values is done for a multiple number of times generating multiple number of ‘completed’ data-sets, where none of the attributes are missing any longer. We adopt the sequential regression multiple imputation algorithm of Raghunathan, Lepkowski, Van Hoewyk & Solenberger (2001) and use 20 imputations in particular. Both the non-parametric and parametric measures of IOP are then analyzed separately over each of the ‘completed’ data-set and combined by Rubin’s rule (Rubin 1986) to give the final measures of IOP. However the exercise of multiple imputation does not mean to ‘create’ the missing values in a deterministic fashion, but rather to capture the additional features of the ‘response’ part to use it in the final analysis. Therefore two of the important criteria for a successful imputation are, that the imputation model should provide good estimates of the missing parental attributes from 18 In particular we consider our ‘response’ part to constitute of samples who are living with their parents, with father as the household head. However a co-resident household may consist of other members with information on parents as well. Two cases in particular are excluded. First we did not take grandchildren of the household head for simplicity. Secondly, households where the adult working child share the headship and is living with one of his/her parents should also be taken into account, but could not be, because in this case NSSO reports father/mother/father-in-law/mother-in-law by a single code, making it impossible to extract information on biological parents. However these two cases together do not exclude more than 10% of the sample, as far as adults are considered. 19 This includes some household characteristics like household size, caste, sector (rural/urban), religion, consumption expenditure and offspring’s’ characteristics like their age, relation to head, marital status, region of residence, sex, occupation, education, along with some other survey-specific attributes. Further details of our imputation model are provided in Appendix A. 12 a bunch of non-missing variables and that the relation between them remain the same for the ‘non-response’ part as well. While the former can be tested by the imputation model diagnostics, given the data-set the latter can at best be reasonably assumed (Marchenko & Eddings 2011). In particular the second criteria of a good imputation requires that the probability of the missing data does not depend on any unobservable factor and hence can be imputed successfully from the imputation model (Allison 2000). Our imputation exercise and eventually the measures of IOP also bank on this assumption, which is the so called assumption of ‘missing at random’ (MAR)20 . Summary statistics of our sample, as well as our sub-sample for the wage analysis (wage sample), is given in Table 1. Working sample 2011-12 [68] Wage sample 2011-12 [68] age hhsize %rural %married %noschool %agri %wage N 32.74 (0.05) 5.0 (0.01) 0.72 (0.00) .82 (0.00) .24 (0.00) .45 (0.00) .48 90574 32.24 (0.07) 4.7 (0.01) .65 (0.00) .79 (0.00) .24 (0.00) .33 (0.01) 1.0 41619 Table 1: Work sample summary statistics a a standard errors are in parentheses and round number in squared bracket. ‘age’ and ‘hhsize’ reports the mean age and household size of our sample. %rural, %married, %noschool, %agri and %wage reports the share of rural sample, married individuals, samples without any formal schooling, samples engaged in agricultural jobs and samples who further have the information on wage data, respectively. The last column (N) reports the respective sample size. Table 1 reports the mean age, household size (hhsize), share of rural sector, share of married samples, share of individuals without any formal schooling (noschool) and share of population engaged in agriculture (agri) in our working sample along with the respective sample size. First of all, similar to the general picture of the whole country, our sample is predominantly rural with a substantial population in agricultural occupations. However even in 2012, nearly one-fourth of our sample have no experience of formal schooling ever. The last but one column reports the percentage share of our working sample to have information on wage data. It shows that more than half of our working sample do not have information on wage data, which explains the massive reduction of sample size for our wage sample. The lower panel of Table 1 shows that the regular and casual wage earners are usually less rural and less agricultural. Table 2 gives the circumstance specific composition for each of our five circumstance variables (caste, sex, region, parental education, father’s occupation). Due to low female labor force participation, notice that both of our working and wage sample are rather male dominated 20 Since we can never actually test whether the missing-ness depend on some unobservable factor not provided by the data-set, we have to assume MAR. However, since adult inter-generational co-residence is the rather prevalent social pattern for most part of India, it is quite reasonable to assume that parental attributes does not depend on some hidden unobservable factors beyond the provision of the survey. Another assumption that of ‘missing completely at random’ (MCAR) is also mentioned in the literature, which assumes that the probability of missing-ness is random. This is rarely the case for any survey data and so for NSS, because co-residence is clearly more probable for younger males and less for females (for female migration due to marriage). However a number of literature suggests that the assumption of MAR is good enough for a reasonable imputation (Rubin 1976, Little 1988, Allison 2000, Raghunathan et al. 2001). Appendix A provides further details of our imputation algorithm and diagnostics. 13 circumstances Ñ share of Ñ Working sample 2011-12 Wage sample 2011-12 Caste Sex Region north Parental education no schooling Father’s occupation agriculture SC/ST male 29.7% 82.1% 6.9% 46.9% 54.3% 34.1% 83.9% 7.8% 46.4% 48.1% Table 2: Circumstance specific summary statisticsa a Each column shows the percentage share of our samples who are - SC/ST, males, residents of Northern region, have both parents without any formal schooling and have agricultural fathers, respectively. with even higher proportion of males in the wage sample21 . Besides as per with the national population distribution, Northern India has relatively less number of samples. Although our wage sample has relatively more lower caste individuals, caste composition for either of our sample is close to the national proportion. Nearly 30% of our sample are from the destitute caste groups of SC/ST which is similar to the caste proportions in the country as a whole. About 46-47% of both of our working and wage sample have neither parents with any formal schooling experience. Besides most of the samples are from agricultural households where fathers are engaged in agro-based occupations. 4 Results and discussion 4.1 Measures of IOP in India To quantify the degree of unequal opportunity in Indian society for consumption and wage, we adopt both the non-parametric and the parametric approaches for at least two good reasons. First, it will serve as a robustness check to our measures of IOP. With the same set of circumstances, the amount of unfair inequality should not have much variation under the nonparametric and the parametric set up. Second, most of the international measures of IOP have used either or both of these methods. Estimating IOP for India under both the approaches will therefore be helpful for international comparisons. Following the extant literature, both inequality and IOP are always measured by the index of mean log deviation. Besides both the non-parametric and the parametric measures of IOP are based on all possible interaction of the full set of circumstances, viz. caste, sex, region, parental education and occupation, leaving us a total of 324 types to compare22 . Table 3 reports the relative IOP as well as the measure of total inequality, for MPCE (consumption) and wage. The first row reports the amount of total inequality measured by the MLD, for each of the outcome variables separately. At par with the recent trend in Indian economy that shows a very sharp increase in consumption inequality, we also find a little higher value of MLD for MPCE as compared to casual/regular wage. But wage outweighs MPCE by a very large extent when the variable of interest is IOP and not the total inequality. 21 In the chosen age group (18-45 yrs.), about 30% females are currently employed, while more than 65% are reported as not in labor force for attending domestic duties during 2011-12. 22 The 324 types correspond to the interaction of - caste(3)sex(2)region(6)parental education(3)father’s occupation(3), where number of categories for each circumstances are in parentheses. 14 Survey year Inequality Measures of relative IOP Non-parametric Parametric MPCE 2011-12 0.28527 Wage 2011-12 0.25101 0.11172 0.10661 0.39310 0.37747 Table 3: Measures of Inequality of opportunity in Indiaa a All IOP measures are the relative measures of IOP and therefore reports the percentage share of IOP in the total inequality upon multiplied by 100. So the non-parametric estimation of IOP in wage for 2011-12 reflects that 39.3% of wage inequality is due to unequal circumstances in that survey year. The last two rows of Table 3 reports the non-parametric and the parametric measures of relative IOP respectively, using all possible interaction of the chosen circumstances. So the nonparametric IOP for wage says that 39.3% of the total wage inequality is due to differences in the chosen set of circumstances during the survey year of 2011-12 and therefore strictly unfair from an ethical perspective. Similar to Ferreira & Gignoux (2011), we also found the non-parametric measures for each outcome to be always little higher than the corresponding parametric measures of IOP. However for all the respective outcome variables, the measures of IOP are close-by under both of the statistical set-ups (non-parametric and parametric), indicating that our results are actually robust to the method adopted. Among the two outcome variables considered, Table 3 shows that the share of ethically unfair inequality is relatively low for MPCE. About 11% of consumption inequality is due to unequal opportunities arising from the differences in the chosen circumstances. The degree of consumption IOP in India is still a bit higher than most of the developed countries and in fact positions India closer to the Sub-Saharan African countries (Cogneau & Mesplè-Somps 2008). The same can not be said for wage though. During 2011-12, about 37-39% of wage inequality in India is conditioned by unequal social and parental backgrounds. At least in terms of wage IOP with a comparable set of circumstances, India seems worse than Brazil that has found to be as one of the most opportunity unequal country in Latin America (Ferreira & Gignoux 2011). Although Consumption and wage are often analyzed side by side in many of the development studies as two comparable source of standard of living, this is not the case for the present analysis. This is because NSS data does not report these two variables in a comparable format and we can point out at least three major sources of variation in the reporting of the consumption and the wage data in our data base. First of all, MPCE is a household level data reported as the total expenditure of the household and is therefore unable to capture any intra-household differences. Wage on the other hand is likely to be rather varying in nature, as it is reported not only for every regular/casual earning members of the household but also for multiple number of activities. Second, MPCE is recorded for a larger recall period of a month. Whereas due to the transitory nature of many casual wage earning jobs, wage is reported for the reference week prior to the date of the survey. Together a shorter recall period along with a finer reporting unit makes the wage data to be more variant and responsive to changes in the individual circumstance factors. Finally, wage and consumption are estimated for different samples and the same set of circumstances may have a differentiated effect for different samples. In particular a large body 15 of self-employed individuals are excluded exclusively from the wage analysis. 4.2 Effect of caste in comparison with parental background India is one of the very few countries where the century old caste system is well embedded even to date. The origin of the caste system was found in the ancient Hindu text, where the society was divided in hierarchical occupational structure. Upper castes are supposed to be engaged in occupations that are more pure in nature like worshiping deities or serving the country as soldiers. Whereas the major occupation of the lower caste categories is to serve the upper caste ‘masters’. Caste in its way became hereditary and is identified at birth that is not convertible for lifetime. Although that makes caste a classic circumstance factor in the context of IOP, it is certainly not the only source of hierarchy in the Indian society and may have its effect through many channels. The purpose of the present section is not to explore these different channels, rather to show the relative importance of caste as a circumstance factor as compared to parental background and other social backgrounds, in the context of estimating IOP for India. Table 4 reports the non-parametric relative measures of IOP with different set of circumstances. The first row gives the non-parametric IOP with the full set of circumstances and is therefore the same as the non-parametric measures in Table 3. From the second row onward we provide the associated estimates of IOP after omitting one or more of the circumstances from our analysis. Measures corresponding to the second row reports the index of non-parametric relative IOP after caste is omitted from our set of circumstances. Similarly the third row estimates IOP without taking any parental attributes (parental education and father’s occupation) as our circumstances and the last row reports the same when all circumstances other than caste are omitted from the analysis. However unless the omitted circumstances are completely orthogonal to the outcome in concern, IOP will always increase with the addition of new circumstances. It is the reason why Ferreira & Gignoux (2011) suggested to interpret the resulting IOP estimates as a lower bound of the true IOP in the society because no study can ever take into account the complete exhaustive set of circumstances. Therefore as expected, IOP mostly decreases as we move down in Table 4 from more to lesser number of circumstances. caste+sex+region+parental backgrounds sex+region+parental backgrounds caste+sex+region caste Relative IOP MPCE Wage 0.112 0.393 0.099 0.363 0.047 0.161 0.014 0.079 Table 4: Effect of omitted circumstances in the measure of IOPa a ‘Parental background’ is abbreviated to indicate circumstances related to parents and therefore includes parental education and father’s occupation. The measures of IOP are the non-parametric relative estimates. Notice that as compared to the first row with full set of circumstances, IOP decreases both for the second and the third row of Table 4, but it is the latter for which the fall in the value of IOP is larger. Even after omitting caste, earning IOP in India is over 36% and consumption IOP too decreases marginally. On the other hand after omitting parental backgrounds from 16 the analysis, only about 16% of the total inequality is deemed unfair for the presence of IOP in wage earning. For either outcomes, IOP more than doubled when parental background is considered as additional circumstances along with the social backgrounds (caste, sex, region), whereas it decreases marginally when only caste is omitted from the analysis. This implies that the omitted effect of caste can be captured to a large extent by the other social and parental attributes considered. But even after controlling for caste, sex and region, differences in parental background have non-trivial additional effect in generating unequal opportunities for all the outcome variables. Hence is the necessity of multiple imputation of information on parental backgrounds, as the social attributes alone are not sufficient to take into account the discriminatory effect of parental backgrounds. In fact with caste as the only circumstance variable, IOP in India is even lesser than some of the developed countries. However a comparison in this regard is not really appropriate as most of the international studies on quantifying IOP involves at least one circumstance regarding parental information. Nevertheless the low estimates of IOP for the last row of Table 4 does not indicate that caste has no role to play in generating unequal opportunities in the Indian society, rather it is indicative of the fact that caste alone can not capture well the differences in other circumstances especially that of parental backgrounds. 4.3 Opportunity tree for contemporary India Either of the non-parametric or the parametric approach uses a fixed model specification for analyzing IOP, where all the circumstances are given equal importance while estimating the resulting measures of IOP in India. However it is possible that caste may matter more in some part of the country with certain family backgrounds or earning opportunity is always less with lesser educated parents but even more when father is an agricultural worker. Neither of the nonparametric or the parametric measures have an answer to this question in the context of IOP. So to investigate the intertwining of our circumstances we adopt the regression tree approach that has been recently introduced in the literature by Brunori et al. (2018). Because of our data structure we have to impute the information on parental backgrounds throughout our analysis. Although we computed the non-parametric and parametric estimates on multiply imputed data set for more precision, it is difficult to perform the same for the regression tree analysis as far as the drawing of opportunity tree is concerned. Since each imputed data set may generate slightly different opportunity trees depending on the imputed values of parental education and occupation, the interpretation of the multiple opportunity trees for a single outcome variable becomes rather complicated. We therefore pick a randomly chosen imputed data set and draw the opportunity tree for that single imputed data-set, separately for each of our outcome variables. All the opportunity trees are drawn on the basis of the same set of circumstances as they are considered for the non-parametric and parametric analysis. So the opportunity tree for all outcome variables are therefore drawn on the basis of - (i) three categories of caste - General [Gen], Other Backward Classes [OBC] and Scheduled Castes/Scheduled Tribes [SCST] (ii) two categories of sex - male [M] and female [F] (iii) six categories of region - North [N], East [E], Central [C], North-East [NE], South [S], West [W] (iv) three categories of parental education 17 none of the parents have any formal schooling [No], at least one have below primary schooling (considered as medium education) [Med] and at least one of them have above primary schooling (considered as high education) [High] (v) three categories of father’s occupation - white collar [WC], blue collar [BC] and agriculture [Agr], where abbreviations in the square brackets are used to label the corresponding categories in the opportunity trees (Figures 1, 2). We submit this full set of circumstances to the program and let the algorithm choose the most relevant ones to draw out the opportunity tree, where the initial node represents the most important circumstance for the respective outcome. Unlike the non-parametric and parametric approaches, types in the regression tree are not all possible combination of the circumstances, rather each terminal node of the tree now correspond to a different type and is represented by the mean outcome of that type. IOP is then measured as the inequality between these typemean outcomes. The major difference with the non-parametric and parametric analysis is that the regression tree traces out the most important interactions among the circumstances in a statistically significant way and estimates IOP only on the basis of those limited number of interactions which are chosen by the program as the most relevant ones. The opportunity tree is therefore able to produce an estimate of IOP that escapes the possible risk of over-fitting arising from unregulated number of interactions. Indeed during 2011-12, Table 5 shows that IOP in consumption is less than 7% and the same for wage is about 32%, when it is estimated using the regression tree algorithm23 . MPCE Wage Measures of relative IOP Regression tree Parametric Non-parametric 0.068 0.107 0.112 0.318 0.377 0.393 Table 5: Different estimations of IOP (2011-12)a a All IOP estimates are measured by the index of mean log deviation on multiply imputed data-sets. The opportunity trees for MPCE and wage are presented in Figures 1 and 2, respectively. First of all, some common patterns across both of the outcome variables are immediately noticeable. For both MPCE and wage, parental background has turned out to be the most important circumstance followed by the region of residence. However, while parental education is the most determining circumstance for generating unequal earning opportunity, it is father’s occupation that is the crucial one for MPCE. Also with some exceptions, the role of other social backgrounds of caste and sex becomes relevant at a later stage for either outcome. Although whenever they matters, females and relatively backward caste categories are mostly on the back foot. The only exception is the case of North-East India where the deprived castes of SC/STs have better earning opportunity than their upper caste peers, as reflected by Figure 2. This actually brings out the special feature of the tribal hub of the North-East region that embodies the highest concentration of SC/ST in the country. 23 Notice that although we draw the respective opportunity trees on the basis of a randomly chosen single imputed data-set, the same is not done for quantifying IOP under the regression tree approach. Similar to the non-parametric and parametric analysis, IOP is measured in the regression tree analysis using all the 20 imputed data-sets and by the index of mean log deviation. 18 Although geographical region of residence (zone) is pointed out as important in MPCE as it is in the wage earning, the advantageous group in aspect to this particular circumstance differs across the outcomes. There is relatively lesser consumption opportunity for working Indian adults who are the residents of East and Central regions, even more so if they are from the lower caste categories. However as far as earning opportunity is concerned, non-self employed wage earners living in the North-Eastern part are actually better off than the rest of the country and even more so if they are from the destitute caste groups of SC/ST belonging to a non-agricultural family. Although having the largest concentration of SC/ST (particularly ST) may impart different caste dynamics in North-East, this may not be representative of the overall national scenario as the wage analysis is only limited to the non-self-employed workers comprising of both regular and casual workers. The common feature across these workers are that either of them are paid by an external agent, but while regular workers are paid a regular monthly salary, casual workers get paid on transient public work based projects. Therefore this does not include a big portion of self-employed workers and hence a significant portion of SC/ST who are living their livelihood on farming or gathering in their own land are out of the wage analysis. Further since 1950, SC/STs are benefited from a caste based reservation quota for most of the regular jobs and the forward general caste people are not. Provided their higher concentration, this may contribute to the better earning opportunity of SC/ST in this region as compared to the upper castes there. 19 20 ‘n’ and ‘y’ denote the sample size and the mean MPCE in INR (Indian Rupee), respectively, for the corresponding terminal node. Parent edu, Father occu and zone represent the circumstances of parental education, father’s occupation and region of residence, respectively. a Figure 1: MPCE (2011-12 )a 21 ‘n’ and ‘y’ denote the sample size and the mean (daily) wage in INR (Indian Rupee), respectively, for the corresponding terminal node. Parent edu, Father occu and zone represent the circumstances of parental education, father’s occupation and region of residence, respectively. a Figure 2: Wage (2011-12 )a 5 Concluding remarks In this paper we estimate the amount of IOP for India in consumption expenditure and wage earning, using the latest employment unemployment survey of NSS for the year 2011-12. We consider a set of five circumstance factors comprising of caste, sex, region, parental education and father’s occupation. Using the most widely used methodologies in estimating IOP, we found that 39% of wage inequality is due to unequal opportunities that comes from belonging to different caste, sex, region or parental backgrounds on which nobody has any control. This is higher than some of the most opportunity unequal countries in Latin America. However due to the selective reporting of wage data in NSS, our wage analysis is limited to the non-self-employed regular or casual workers of the country and excludes a substantial portion of self-employed working adults. On the other hand, both of the non-parametric and parametric methods estimate that the share of unfair inequality in consumption is around 11%. But consumption for being reported as the total monthly household consumption expenditure, may not be well responsive to changes in the individual circumstances and thereby has a chance to be underestimated. Due to the structure of NSS, information on parental attributes is provided for the ‘co-resident’ households only where the adult working child is enumerated along with his/her parents living in the same house. So to incorporate parental background information we adopt the statistical technique of multiple imputation and is therefore able to provide estimates of IOP in India neither by restricting our sample to the selected households with adult intergenerational coresidence, nor by sacrificing the most important circumstance variables of parental backgrounds from the entire analysis. The other social circumstances like caste, sex and region on the other hand, are non-missing for the entire sample. We further found that the degree of IOP is substantially underestimated if parental backgrounds are omitted from the set of circumstances, whereas this is not the case when caste is omitted. In fact IOP in India is estimated even lower than some of the developed countries while taking the social circumstances alone (caste, sex, region). In addition we also found that in spite of numerous evidence on caste discrimination in the Indian society, taking caste as the only circumstance factor is not enough as far as quantifying IOP is concerned. The hierarchical division of caste is therefore not able to capture well the differences in other omitted circumstances, especially that of parental backgrounds. Similar to the extant literature, both of our non-parametric and parametric measures of IOP are based on all possible interactions of the circumstances, while in reality some of them may be more relevant. To explore the intertwining of our circumstances we further provide the opportunity structure for India using the recently introduced approach of the regression tree analysis. We found parental education to be the most important circumstance for wage, whereas it is the occupational category of father that seems the most important source of unequal opportunity in consumption. Irrespective of the outcomes, individuals from agricultural family backgrounds however, are always worse off. Although in most of the cases, the social backgrounds of caste or sex come at a later stage in the circumstance hierarchy, the premium for being a male or a member from the forward caste is prominent even in 2012. The opportunity tree also brings forth the special case of the tribal part of India, the North-Eastern region, where the most historically disadvantageous caste categories of SC/ST have better earning opportunity than the upper castes there, which is never the case for the rest of the country. 22 Appendices A Multiple imputation A.1 The algorithm of multiple imputation of chained equation To impute parental education and father’s occupation, we adopt a multivariate imputation approach, in particular, the sequential regression multiple imputation algorithm of Raghunathan et al. (2001). This algorithm draws the imputed values through a series of univariate regressions, or equivalently, through a series of chained equations and hence, is also called the multiple imputation of chained equations (MICE). The underlying imputation model specification takes all the variables as predictors except the one to be imputed. First, the variables to be imputed are ordered from the least to the highest (in terms of missing values) and then start imputing the variable for which missing information is minimum, using predictors without any missing value. The next ordered variable (with second least number of missing values) is then imputed using the non-missing predictors, as well as the imputed value of the first variable. The process continues till the variable with highest number of missing value is imputed. Further, each imputation consists of multiple cycles or iterations to get more stable set of imputed values, based on which, the final vector of imputed values are drawn for the entire working sample. The algorithm is detailed in Raghunathan et al. (2001)24 . For two imputed variables, the regression sequence is described as below. Let X1 and X2 be the variables to be imputed with the fully specified vector of variables denoted by Z and let X1 be the variable with the least number of missing values (which in our case, is parental education for all rounds). In the first cycle, X1 is regressed on Z (i.e. X1 Ñ Z) and the missing values in X1 are imputed by simulated draws from the posterior distribution of X1 . Then X2 is regressed on Z along with the imputed values of X1 , (i.e. X2 Ñ X1m , Z) and imputed values of X2 are drawn similarly. In the cycles thereafter, each of X1 and X2 are regressed on the fully specified variables along with the previously imputed variables. Thus, in the second cycle, the prediction sequence is (X1 Ñ X2m , Z), (X2 Ñ X1m , Z) and so on. The cycles are continued (often upto 10 to 20 iterations) to converge to a set a stable imputed values tX11, X21u, that constitutes the first imputed data set. The entire process with the same number of iterations are then repeated M times, to produce M copies of the imputed data sets, with imputed variables tpX11 , X21 q, . . . , pX1M , X2M qu. The non-parametric and parametric measures of IOP are then estimated for each of these M imputed data sets and the final estimate of IOP is then estimated as the average of all the imputed data sets [Rubin’s rule (Rubin 1986)]. Notice that, after the first cycle, all the missing values are imputed. If the missing pattern is monotone, that is, if X2 is missing only if X1 is missing, there is no need of further iteration. Only cycle one is repeated M times to produce multiple copies of the imputed data set. In that case the prediction sequence is like - (X1 Ñ Z); (X2 Ñ X1m , Z). Since X2 is only missing when X1 is missing, this sequence is enough to draw sensible imputed values for both the variables (Raghunathan et al. 2001). When missing pattern is arbitrary, iteration is needed so as to get a stable set of imputed values, that is repeatedly predicted by old and newly imputed values. 24 Also see Royston et al. (2011), Azur et al. (2011). 23 A.2 Imputation model and diagnostics The variables to be imputed in our case, are - parental education and father’s occupation, where the former is generated by combining father’s and mother’s education25 . To reduce imputational rigor, we consider to impute the combined parental education, instead of imputing each of the father’s and mother’s education (much in the spirit of ‘transform then impute’ (Von Hippel 2009)). We estimate an ordered logistic regression as our imputation model, to estimate parental background with a broad range of covariates, that are not missing for the entire work sample. Following the literature (Rubin 1986, Little 1988, Schafer 1999), we include three broad set of covariates - (i) the analysis model variables (caste, sex, zone along with their all possible interactions), (ii) the auxiliary variables (household size, consumption expenditure, sector, religion, along with children age, age squared, education, occupation, sex, marital status, relation to head) and (iii) the survey specific variables (sub round, second stage stratum, first stage units26 ). Following Teyssier (2017), who have used MI for the same purpose of imputing parental background for Brazil, we include the sample weight as a predictor as well (along with the normal use of sample weights in the logit model). In addition, children wage and its interaction with age is also considered for the wage sample imputation. The imputation model does not have any claim of causality, but it should fit the data well. With highly significant model chi-square statistics for all rounds, Table 6 does not indicate that our chosen imputation model is a poor fit for any of the imputed variables. Year 2011-12 2011-12 Likelihood Ratio Chi-square Parental [p-value] Father’s [p-value] education occupation Work sample 2978.2 [0.000] 4118.6 [0.000] Wage sample 1632.9 [0.000] 1779.4 [0.000] Pseudo R2 Parental Father’s education occupation 0.181 0.418 0.215 0.388 Table 6: Imputation model checka a We report McFadden R2 in particular. Around 70% of our working sample have missing information on parental background that we needed to impute. Multiple imputation is a simulation based algorithm and hence, the power and precision of the multiply imputed values are likely to increase with the number of imputations, especially when missing data proportion is large. So far in the literature, there is no unequivocal rule to choose an optimum number of imputations. However, even with a high fraction of missing information, a number of literature often recommends that a modest number of imputation is good enough to generate statistically sound imputed values (Rubin 1986, Schafer 1999)27 . As shown by Rubin (1986), the relative efficiency of an infinite number of 25 In case of single-parent household, that constitute about 8% of the co-resident sample, parental education is the education of the single parent. 26 NSSO adopts a complex stratifying sampling procedure, with households as the first stage units and individuals as the ultimate stage units. It further divides the survey year in four sub-rounds comprising of three months in each. Second stage stratum is a middle level stratification made by NSSO on the basis of affluent households to make sure that the final selected households are not restricted to any specific economic class. 27 Besides, in case of a complex imputation model with large number of variables and sample size, even a single imputation takes hours to complete, and so more, if it is iterative. The computational effort associated with the 24 imputations subject to a finite one, is p1 γ {mq1{2 , where γ and m are the fraction of missing information and the number of imputations, respectively28 . In case of 70% missing information (γ  0.7), the relative large sample efficiency is already 0.96 with 10 imputations, that increases to 0.98 for 20 imputations. Since in case of large degrees of freedom, each additional imputation adds little to the efficiency of the estimated parameter (Schafer & Olsen 1998), we choose to do 20 imputations and each imputation is generated from a simulated draw of 20 iterations. However, “a naive imputation is worse than doing nothing” (Little 1988, p 288). We have a total of 20 imputed data-set. For a randomly chosen imputation, Table 7 reports the distribution of the imputed variables in the observed data-set (‘response’), the imputed data-set (‘nonresponse’) and the completed data-set (‘response’+‘non-response’), for both of our final working sample and the wage sub-sample. At a glance, father’s occupation seem to have been imputed better, for it has similar distribution across all the data-sets. Whereas, more parents are pointed as having no formal education for the imputed data-set. But that does not mean a faulty imputation of parental education, and in fact, the difference in its distribution is indicative of a rather sensible imputation. The non-co-resident sample, who are, on average, 10 years older than the co-resident ones, are supposed to have older parents. Provided the substantial educational improvement over time for all generations, as is reflected by Table 8 and 9, older parents are more likely to be deprived of formal education, exactly as they are imputed. On the other hand, Table 8 also shows that occupational composition of the samples does not seem to be markedly different due to co-incidence. Provided low occupation mobility in India, this is likely to be true for parents as well29 . Besides, as a robustness check, we found that the pattern of the distributions of the imputed values are similar for many other imputed data sets as well. higher number of imputations in these cases, are often too prohibitively high to make little sense to increase the number of imputations for a marginal increase in efficiency (Allison 2003, Von Hippel 2005, Azur et al. 2011). 28 Missing information, strictly speaking, is not the same as the number of missing data points. With high correlation between the missing variables and the observed covariates, γ is actually lesser than the percentage of missing values (Graham et al. 2007). However, they are the same in the simplest setting. 29 Also note from Table 9, that in 2011-12, 56% of co-resident sample have their fathers working in agricultural sector, while 45% of them are in agricultural job themselves (Table 8). 25 Survey year Ñ obs. 2011-12 imp. comp. Work sample imputation Parental education No schooling 0.305 Below primary 0.280 Above primary 0.415 Father’s occupation White collar 0.198 0.353 Blue collar Agricultural 0.449 diagnostics Wage sample imputation Parental education No schooling 0.326 Below primary 0.270 Above primary 0.404 Father’s occupation White collar 0.181 Blue collar 0.473 Agricultural 0.346 diagnostics 0.379 0.263 0.358 0.354 0.269 0.378 0.194 0.408 0.398 0.195 0.393 0.412 0.379 0.247 0.374 0.363 0.254 0.383 0.124 0.489 0.387 0.138 0.485 0.377 Table 7: Imputation diagnosticsa a Where ‘obs.’, ‘imp’ and ‘comp.’ stand for observed, imputed and completed data set, respectively. For reporting the imputed and the completed data set, we choose one imputation at random (among 20 imputations). 26 B Additional tables and figures age hhsize %male %rural %SC/ST %married %noschool %agri %wage N 32.8 5.0 0.82 0.72 0.30 0.82 0.24 0.45 0.48 90574 35.5 4.4 0.77 0.71 0.31 0.96 0.31 0.46 0.49 59592 26.5 6.5 0.93 0.72 0.26 0.50 0.10 0.42 0.45 30982 Working sample (total) 2011-12 Non-response part (non-co-resident) 2011-12 Response part (co-resident) 2011-12 Table 8: Summary statistics: working sample, response part and non-response parta a Response part correspond to the co-resident sample for which parental information is provided in the data-set, whereas the non-response part are the non-co-resident samples for which parental backgrounds are needed to be imputed. Working sample is the union of the response and the non-response part. ‘age’ and ‘hhsize’ reports the mean age and household size of the respective sample. %male, %rural, %SC/ST, %married, %noschool, %agri and %wage reports the share of males, rural inhabitants, SC/STs, married individuals, samples without any formal schooling, samples engaged in agricultural jobs and samples who further have the information on wage data, respectively. The last column (N) reports the respective sample size. Co-resident parents 2011-12 [68] age father age mother %noschool father %noschool mother %noschool both edu year child edu year father edu year mother %dom duty mother %agri father 54.5 (0.09) 49.5 (0.09) 0.42 (0.01) 0.68 (0.01) 0.40 (0.01) 7.7 (0.05) 4.2 (0.05) 2.5 (0.03) 0.72 (0.01) 0.56 (0.01) Table 9: Co-resident sample summary of parentsa a Standard errors are in parentheses and round in squared brackets. In particular, ‘noschool father/mother’ indicates fathers/mothers who are deprived of any formal schooling, whereas ‘noschool both’ means none of the parents have any formal schooling. ‘edu yr’ abbreviates as the year of education. ‘%dom duty mother’ denotes the share of mothers who have reported not to be in the labor market for attending domestic duties and ‘%agri father’ are the share of fathers engaged in agriculture related jobs. 27 Ref: General OBC SC/ST Ref: Primary plus Primary or below No schooling Ref: White collar Blue collar Agricultural Ref: North East Central North-East South West MPCE Wage -0.058 (0.00) -0.063 (0.00) -0.136 (0.00) -0.146 (0.00) -0.067 (0.00) -0.102 (0.00) -0.274 (0.00) -0.409 (0.00) -0.075 (0.00) -0.226 (0.00) -0.094 (0.00) -0.303 (0.00) -0.359 (0.00) -0.403 (0.00) -0.179 (0.00) -0.196 (0.00) -0.281 (0.00) -0.229 (0.00) -0.267 (0.00) -0.022 (0.39) -0.055 (0.02) -0.299 (0.00) 0.010 (0.38) 5.38 (0.00) -0.316 (0.00) 3.90 (0.00) Ref: Male Female Intercept Table 10: Reduced form OLS: for MPCE and Wagea Standard errors are in parenthesis. ( ,  ,  ) correspond to 1%, 5% and 10% level of significance, respectively. a 28 References Allison, P. D. (2000), ‘Multiple imputation for missing data: A cautionary tale’, Sociological methods & research 28(3), 301–309. Allison, P. D. (2003), ‘Missing data techniques for structural equation modeling.’, Journal of abnormal psychology 112(4), 545. Alon, S. (2009), ‘The evolution of class inequality in higher education: Competition, exclusion, and adaptation’, American Sociological Review 74(5), 731–755. Arneson, R. (1989), ‘Equality of opportunity and welfare ’, Philosophical Studies 56, 77–93. Asadullah, M. N. & Yalonetzky, G. (2012), ‘Inequality of educational opportunity in India: Changes over time and across states’, World Development 40(6), 1151–1163. Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. (2011), ‘Multiple imputation by chained equations: what is it and how does it work?’, International journal of methods in psychiatric research 20(1), 40–49. Bourguignon, F., Ferreira, F. H. & Menéndez, M. (2007), ‘Inequality of opportunity in Brazil ’, Review of income and wealth 53(4), 585–618. Brunori, P., Ferreira, F. & Peragine, V. (2013), Inequality of Opportunity, Income Inequality, and Economic Mobility: Some International Comparisons, Paus E. (eds) Getting Development Right; Palgrave Macmillan, New York. Brunori, P., Hufe, P. & Mahler, D. G. (2018), ‘The roots of inequality: Estimating inequality of opportunity from regression trees.’. Checchi, D. & Peragine, V. (2010), ‘Inequality of opportunity in Italy ’, Journal of economic inequality 8, 429–450. Checchi, D., Peragine, V. & Serlenga, L. (2010), ‘Fair and unfair income inequalities in Europe’, IZA discussion paper No. 5025 . Cogneau, D. & Mesplè-Somps, S. (2008), ‘Inequality of opportunity for income in five countries of Africa’, John Bishop, Buhong Zheng (ed.); Inequality and Opportunity: Papers from the Second ECINEQ Society Meeting, Emerald Group Publishing Limited 16, 99–128. Cohen, G. A. (1989), ‘On the currency of egalitarian justice ’, Ethics 99, 906–944. Deaton, A. & Dreze, J. (2002), ‘Poverty and Inequality in India: A Re-Examination ’, Economic and Political Weekly 37(36), 3729–3748. Dev, S. M. & Ravi, C. (2007), ‘Poverty and Inequality: All-India and States, 1983-2005’, Economic and Political Weekly 42(6), 509–521. Dworkin, R. (1981a), ‘What is equality? Part 1: Equality of resources ’, Philosophy & public affairs 10, 283–345. Dworkin, R. (1981b), ‘What is equality? Part 1: Equality of welfare ’, Philosophy & public affairs 10, 185–246. 29 Ferreira, F. H. & Gignoux, J. (2011), ‘The measurement of inequality of opportunity: theory and an application to Latin America ’, The review of income and wealth 57(4). Ferreira, F. H. & Peragine, V. (2015), ‘Equality of Opportunity: Theory and evidence ’, Policy research working paper (WPS 7217, Washington, D.C: World Bank Group). Graham, J. W., Olchowski, A. E. & Gilreath, T. D. (2007), ‘How many imputations are really needed? some practical clarifications of multiple imputation theory’, Prevention science 8(3), 206–213. Himanshu (2007), ‘Recent Trends in Poverty and Inequality: Some Preliminary Results’, Economic and Political Weekly 42(6), 497–508. Himanshu (2018), ‘Widening Gaps: India Inequality Report 2018’, Oxfam India . Hnatkovska, V., Lahiri, A. & Paul, S. B. (2012), ‘Caste and labor mobility ’, Applied economics 4(2). Hnatkovska, V., Lahiri, A. & Paul, S. B. (2013), ‘Breaking the Caste Barrier: Intergenerational Mobility in India’, Journal of human resources 48(2), 435–473. Hothorn, T., Hornik, K. & Zeileis, A. (2006), ‘Unbiased recursive partitioning: A conditional inference framework’, Journal of Computational and Graphical statistics 15(3), 651–674. Jong-Sung, Y. & Khagram, S. (2005), ‘A comparative study of inequality and corruption’, American sociological review 70(1), 136–157. Lefranc, A., Pistolesi, N. & Trannoy, A. (2009), ‘Equality of opportunity and luck: definitions and testable conditions, with an application to income in France (1979-2000)’, Journal of public economics 93, 1189–1207. Little, R. J. (1988), ‘Missing-data adjustments in large surveys’, Journal of Business & Economic Statistics 6(3), 287–296. Marchenko, Y. V. & Eddings, W. (2011), ‘A note on how to perform multiple-imputation diagnostics in stata’, College Station, TX: StataCorp . Marrero, G. A. & Rodrı̀guez, J. G. (2011), ‘Inequality of opportunity in the United States: trends and decomposition’, Research on Economic Inequality 19, 217–216. NSSO (2008), ‘NSS Report No. 533: Migration in India: July, 2007-June, 2008’, National Sample Survey Organization, Ministry Of Statistics and Program Implementation (MOSPI), Govt. of India . Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J. & Solenberger, P. (2001), ‘A multivariate technique for multiply imputing missing values using a sequence of regression models’, Survey methodology 27(1), 85–96. Ramos, X. & Van de Gaer, D. (2012), ‘Empirical approaches to inequality of opportunity: Principles, measures and evidence ’, IZA discussion paper no. 6672 . Rawls, J. (1971), A theory of justice, Cambridge: Harvard University Press. 30 Roemer, J. (1993), ‘A Pragmatic Theory of Responsibility for the Egalitarian Planner’, Philosophy & Public Affairs 22, 146–166. Roemer, J. (1998), Equality of Opportunity, Harvard University Press, Cambridge, MA. Roemer, J. E. & Trannoy, A. (2013), ‘Equality of Opportunity’, Cowles foundation discussion paper no. 1921 . Royston, P., White, I. R. et al. (2011), ‘Multiple imputation by chained equations (mice): implementation in stata’, J Stat Softw 45(4), 1–20. Rubin, D. B. (1976), ‘Inference and missing data’, Biometrika 63(3), 581–592. Rubin, D. B. (1986), ‘Basic Ideas of Multiple Imputation for Nonresponse’, Survey Methodology, Statistics Canada 12(1), 37–47. Salehi-Isfahani, D., Hassine, N. B. & Assaad, R. (2014), ‘Equality of opportunity in educational achievement in the middle east and north africa’, The Journal of Economic Inequality 12(4), 489–515. Schafer, J. L. (1999), ‘Multiple imputation: a primer’, Statistical methods in medical research 8(1), 3–15. Schafer, J. L. & Olsen, M. K. (1998), ‘Multiple imputation for multivariate missing-data problems: A data analyst’s perspective’, Multivariate behavioral research 33(4), 545–571. Singh, A. (2012), ‘Inequality of opportunity in earnings and consumption expenditure: The case of Indian men’, The review of income and wealth 58(1), 79–106. Teyssier, G. (2017), ‘Inequality of opportunity: New measurement methodology and impact on growth’, Seventh ECINEQ Meeting, New-York City (mimeo) . Von Hippel, P. T. (2005), ‘Teacher’s corner: How many imputations are needed? a comment on hershberger and fisher (2003)’, Structural Equation Modeling 12(2), 334–335. Von Hippel, P. T. (2009), ‘8. how to impute interactions, squares, and other transformed variables’, Sociological methodology 39(1), 265–291. 31 LATEST TITLES IN THE CSH-IFP WORKING PAPERS Note: The USR3330 Working Papers Series has been renamed as CSH-IFP Working Papers in 2018. However the numbering continues uninterrupted. Exploring Urban Economic Resilience : The Case of A Leather Industral Cluster in Tamil Nadu. - Kamala Marius, G. Venkatasubramanian, 2017 (WP no. 9) https://hal.archives-ouvertes.fr/hal-01547653 Contribution To A Public Good Under Subjective Uncertainty. - Anwesha Banerjee, Nicolas Gravel, 2019 (WP no. 10) https://halshs.archives-ouvertes.fr/halshs-01734745 Vertical governance and corruption in urban India: The spatial segmentation of public food distribution - Frédéric Landy with the collaboration of Thomas François, Donatienne Ruby, Peeyush Sekhsaria, 2018 (WP no. 11) https://hal.archives-ouvertes.fr/hal-01830636 Is the preference of the majority representative? - Mihir Bhattacharya and Nicolas Gravel, 2019 (WP no. 12) https://hal.archives-ouvertes.fr/hal-02281251 Evaluating Education Systems - Nicolas Gravel, Edward Levavasseur and Patrick Moyes, 2019 (WP no. 13) https://hal.archives-ouvertes.fr/hal-02291128 Institut Français de Pondichéry Pondicherry Centre de Sciences Humaines New Delhi