DISCUSSION PAPER SERIES
IZA DP No. 11084
Assortative Mating and
Earnings Inequality in France
Nicolas Frémeaux
Arnaud Lefranc
OCTOBER 2017
DISCUSSION PAPER SERIES
IZA DP No. 11084
Assortative Mating and
Earnings Inequality in France
Nicolas Frémeaux
Université Paris 2
Arnaud Lefranc
University of Cergy-Pontoise, THEMA and IZA
OCTOBER 2017
Any opinions expressed in this paper are those of the author(s) and not those of IZA. Research published in this series may
include views on policy, but IZA takes no institutional policy positions. The IZA research network is committed to the IZA
Guiding Principles of Research Integrity.
The IZA Institute of Labor Economics is an independent economic research institute that conducts research in labor economics
and offers evidence-based policy advice on labor market issues. Supported by the Deutsche Post Foundation, IZA runs the
world’s largest network of economists, whose research aims to provide answers to the global labor market challenges of our
time. Our key objective is to build bridges between academic research, policymakers and society.
IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper
should account for its provisional character. A revised version may be available directly from the author.
IZA – Institute of Labor Economics
Schaumburg-Lippe-Straße 5–9
53113 Bonn, Germany
Phone: +49-228-3894-0
Email:
[email protected]
www.iza.org
IZA DP No. 11084
OCTOBER 2017
ABSTRACT
Assortative Mating and
Earnings Inequality in France*
This paper analyzes economic assortative mating and its contribution to inequality in
France. We first provide descriptive evidence on the statistical association in several
socio-economic attributes of partners among French couples (annual earnings, potential
earnings, education, occupation). Second, we assess the contribution of assortative mating
to earnings inequality between couples. Contrary to previous estimates, we account for
possible biases in the estimation of assortative mating arising from sample-selection into
the labor force. We also provide a new method for assessing the contribution of assortative
mating to inequality in couple’s potential earnings. Our results indicate a strong degree
of assortative mating in France. The correlation coefficient for education is above 0.6.
The correlation in earnings is lower but sizable: around 0.17 for annual earnings, when
including zeroes; around 0.35 for full-time equivalent earnings and up to 0.49 when
using multi-year average earnings. We show that assortative mating tends to increase
inequality among couples, compared to random mating. For annual earnings, the effect is
non-negligible and accounts for 3 to 9% of measured inequality. The effect of assortative
mating on household potential earnings is much larger and amounts to 10 to 20% for
observed inequality.
JEL Classification:
J12, J22, D31
Keywords:
assortative mating, inequality, earnings, labor supply, France
Corresponding author:
Arnaud Lefranc
Université de Cergy-Pontoise
33 boulevard du Port
95011 Cergy-Pontoise
France
E-mail:
[email protected]
* This research was developed under the auspices of the Labex MME-DII Center of Excellence (grant ANR11LBX-0023-01), the support of which is gratefully acknowledged.
1
Introduction
An abundant sociological literature has provided evidence of a high correlation of educational and social attributes within couples, in most developed countries (e.g. Mare 1991,
Blossfeld and Timm 2003). In comparison, available evidence on the extent of assortative
mating according to economic characteristics is much more limited. Investigating the degree of homogamy in modern societies is however crucial for at least three reasons. First,
the propensity to mate into homogenous couples might amplify existing earnings inequality between individuals. Although several papers have recently investigated this issue1 ,
the extent to which assortative mating contributes to economic inequality between couples
remains largely unknown. Second, as discussed in Becker (1973) and Zhang and Liu (2003),
observed assortative mating patterns might shed light on the nature of intra-household production and allocation decisions. Lastly, to the extent that it shapes household resources,
assortative mating will largely condition child upbringing decisions and might contribute
to the intergenerational transmission of inequality (e.g. Becker and Tomes 1979, Black and
Devereux 2011).
In this paper, we study economic assortative mating in France. Our contribution is
threefold. We first provide comparable evidence on assortative mating among French couples for various attributes (occupation, education, annual earnings), as usually investigated
in the literature. Second, in order to account for endogenous labor supply, we examine the
association within couples in individual potential earnings, measured by full-time equivalent earnings. Moreover, we account for potential biaises in the estimation of assortative
mating arising from sample-selection into the labor force. Third, we assess the contribution
of assortative mating to inequality between couples in both observed annual earnings and
potential earnings.
Available evidence on the extent of economic assortative mating appears relatively
sparse. Most studies have focused on assortative mating by education (e.g. Goux and
Maurin 2003, Schwartz and Mare 2005) or social origin (e.g. Kalmijn 1991, Uunk, Ganzeboom, and Róbert 1996). Assortativeness along other economic dimensions such as individual earnings or preferences has been much less analyzed2 . This represents an important
1
See in particular Karoly and Burtless (1995), Cancian and Reed (1998), Burtless (1999), Schwartz
(2010), Eika, Mogstad, and Zafar (2017) Greenwood, Guner, Kocharkov, and Santos (2014), Harmenberg
(2014), Pestel (2017)
2
Arrondel and Fremeaux (2016), Dohmen, Falk, Huffman, and Sunde (2012) and Kimball, Sahm, and
Shapiro (2009) are some of the few exceptions.
2
limitation for at least two reasons. First, it does not allow to fully capture the contribution
of marital choices to economic inequality. Second, in a period of rising returns to skills,
a constant degree of educational or occupational assortativeness might hinder a rising polarization of the distribution of family resources. To partially address these issues, recent
research has examined the statistical association between male and female labor earnings
within couples. Available evidence points to a sizable correlation, of up to 20%, in individual earnings (e.g. Burtless 1999, Nakosteen, Westerlund, and Zimmer 2004, Schwartz 2010).
The analysis is however largely confined to the United States and much less is known of
the situation in European societies.3
Existing studies suffer from several empirical limitations. First, estimates are generally
based on cross sectional data in which earnings are only observed on a single year. However,
annual earnings might incorporate sizable measurement errors and transitory shocks that
can bias downward the estimates and lead to an underestimation of the association in
partners’ earnings.4 In this paper, we exploit panel data to compute average earnings over
multiple years in order to address this issue. Second, most papers have focused on the
statistical association in annual earnings. However annual earnings reflect both individual
productivity characteristics and endogenous joint labor supply decisions taken within the
couple. The confounding effect of labor supply decisions might jeopardize the assessment
of the degree of assortative mating. An important concern, in this respect, is that a sizable
share of women in couples report zero earnings as they do not participate in the labor force.
If labor force participation is positively associated with partner’s earnings, this will lead to
underestimate the degree of assortative mating in individual economic characteristics.
In this paper, this issue is addressed by analyzing the statistical association in potential
earnings within couples. Potential earnings are defined by the individual full-time equivalent earnings. Potential earnings of a couple thus represent the earnings it would receive
if each partner worked full time, given the individual market wage rate of its members.
Compared to reported annual earnings, potential earnings provide a more extensive measure of the total economic resources commanded by the couple, which is more relevant to
3
Among the few exceptions are : Nakosteen, Westerlund, and Zimmer (2004) on Sweden,Pestel (2017)
on Germany, Eika, Mogstad, and Zafar (2017) on Norway, Germany and Denmark. The present paper
only uses the French version of the EU-SILC database. The analysis will be extended to other European
countries in future research.
4
The incidence of measurement errors has been widely documented in the related field of intergenerational earnings mobility studies. See for instance Solon (1992) and the survey of Black and Devereux
(2011).
3
assess inequality in welfare between households. First, individuals out of the labor force
might have a positive contribution to the household’s consumption of goods and services
through domestic production, as emphasized in Gronau (1977). Available estimates indeed
suggest that domestic production represent a sizable fraction of household consumption.5
Measures of household production value usually combine individual market wage information with time-use surveys to value the domestic production of basic services (cleaning,
gardening, shopping...). This leaves aside the value of leisure enjoyed, and, for households
with children the value of human capital investment undertaken at home. Our measure
of potential earnings values total available time at the prevailing individual market wage.
This can be seen as an encompassing measure of the resources available that ultimately
determine household welfare.
Of course, one of the difficulties in assessing the intra-household correlation in potential
earnings is that the market wage rate is not observed for individuals out of the labor force.
This problem does not arise when using observed earnings (including zeroes). We explicitly
account for sample selection due to non-participation and provide estimates of the intracouples correlation in (possibly latent) potential earnings by extending the usual regression
with sample-selection model.
One of the main economic motivations for studying assortative mating lies in its potential contribution to economic inequality between couples. Empirical analyses of earnings
inequality have mainly stressed the influence of aggregate shocks (rise in the returns to
skills, skill-biased technological change, globalization, etc.), institutions and policies (labor
market deregulation, decrease in marginal income tax rates, etc.) as the main drivers of the
recent rise in inequality in most developed countries. The effects of demographic factors,
in particular assortative mating patterns, have only been studied recently and the effect of
assortative mating on inequality is generally found to be modest. Specifically, Greenwood,
Guner, Kocharkov, and Santos (2014) estimate that the Gini coefficient for the United
States would decrease from 0.43 to 0.42 when random matching is imposed while Eika,
Mogstad, and Zafar (2017) conclude that the contribution of assortative mating to inequality is around 5% . The main route taken in the literature is to compare the observed
earnings distribution to a counterfactual distribution built under alternative hypothetical
mating patterns. However, the construction of this counterfactual distribution requires to
5
See for instance House, Laitner, and Stolyarov (2008), Frazis and Stewart (2011), Ahmad and Koh
(2011), Roy (2012).
4
adequately deal with the endogeneity of labor supply decisions and the self-selection of
individuals into couples, on the basis of their unobserved characteristics.
Two main approaches have been taken, in the recent literature, to build these counterfactual distributions. The accounting approach, also referred to as ‘addition randomization’
in the literature, treats observed annual earnings as a fixed individual characteristic and
simulates the distribution that would prevail if individuals kept their labor earnings unchanged and were randomly matched into couples (e.g. Karoly and Burtless 1995, Cancian
and Reed 1998, Burtless 1999, Schwartz 2010, Hryshko, Juhn, and McCue 2014). Hence,
this approach ignores the labor supply responses that would result from the random rematching of individuals. The so-called behavioral approach (or ‘imputation randomization’) characterizes individuals by some observable earnings determinants, in general education (e.g. Greenwood, Guner, Kocharkov, and Santos 2014, Harmenberg 2014, Pestel
2017, Eika, Mogstad, and Zafar 2017). Individuals are then randomly rematched into counterfactual couples. The joint earnings of the counterfactual couples are simulated on the
basis of the observed distribution among actual couples with similar observable earnings
determinants. Hence, this approach takes into account the endogeneity of labor supply decisions, but only to the extent that it is driven by observable characteristics of the mates.
Furthermore, it ignores the self-selection of individuals into couples on the basis of their
unobservable attributes.
In this paper, we develop a third approach in which we characterize the effect of assortative mating on inequality in couples’ potential earnings. Compared to existing studies,
our approach offers three main advantages. First, as previously discussed, potential earnings provide a broader and more relevant measure of household resources. Second, since
potential earnings are defined as the earnings an individual would receive if he/she worked
full-time, this alternative measure of resources is largely independent of joint-labor supply
decisions in the couple, contrary to annual earnings.6 Our assessment of the impact of
assortative mating on inequality relies on a statistical model of the joint distribution of
the potential earnings of both partners that allows for sample selection in the observed
distribution and correlation across partners in their unobservable earnings determinants.
The third advantage of our approach, compared to other simulation methods discussed in
6
One limitation is the possibility that individual market wage is determined by the past labor supply
decision, as discussed for instance in Eckstein and Lifshitz (2011). In this paper, we do not accounting for
the dynamics of human capital and employment opportunities.
5
the previous paragraph, is thus the ability to account for self-selection of individuals into
couples on the basis of their unobservable attributes.
Our empirical analysis is based on the French waves of the EU-Statistics on Income
and Living Conditions (SILC), covering the period 2004-2011. Our results indicate a strong
degree of assortative mating in France. The correlation coefficient for education is above
0.6. The correlation in earnings is lower but sizable. Specifically, for dual-earner couples,
the correlation is around 0.3 for annual earnings and 0.35 for full-time equivalent earnings.
We then show that sample-selection leads to a moderate upward bias in the estimation
of the within-couple correlation. We also investigate the extent of non-linearities in the
statistical association of earnings and show that positive assortative mating is particularly
high at the top of the earnings distribution. Lastly, our estimates indicate that assortative
mating tends to increase inequality among couples. For annual earnings, the effect is nonnegligible. The addition randomization approach indicates a contribution to inequality
between 9 and 18%. The imputation randomization approach points to a smaller effect of
3 to 9% of measured inequality. The effect of assortative mating on household potential
earnings is however much larger and amounts to 10 to 20% for observed inequality. The
effect of assortative mating is found to be larger for inequality indices more sensitive to the
tails of the distribution which is consistent with the non-linearities of assortative mating.
These findings are robust to the model used for simulating the counterfactual distribution
and to sample selection.
The rest of this paper is structured as follows. Section 2 presents the data. Section
3 provides summary measures of the degree of assortative mating for various individual
attributes (education, socio-economic status, social origin, earnings). In section 4, we
focus on the issue of sample selection. Section 5 estimates the contribution of assortative
mating to earnings inequality among households.
2
2.1
Data
EU-SILC
Our analysis is based on the European Union - Statistics on Income and Living Conditions
(EU-SILC) surveys. We focus on the waves 2004 to 2011 of the French sample. The EUSILC is a longitudinal household survey, coordinated by Eurostat, which gathers data from
6
all EU member states. The main goal of the survey is to study income, poverty, social
exclusion and living conditions in the European Union. The French waves were collected
by the French national statistics institute (INSEE)7 .
Data are collected annually for a rotating panel of households. In the French sample,
individuals are followed for a period of up to 8 years. The survey provides information
on the composition of the household, the link between its members, as well as unique
individual identifiers. The main sampling unit is the household. We define a couple as a
unique pair of individuals reporting to be respectively head and married or common law
partner of the head in a given household. Other pairs of individuals living in the same
household are not considered as a couple. Our sample includes all couples regardless of
their legal status (married or not).
We restrict the sample to couples in which both partners are between 25 and 60 years
old, in which neither partner is self-employed and in which neither partner is out of the
labor force because of retirement or studying. We only keep one observation per couple.
For each individual in a couple, we keep the observation with non-missing information of
the variables of interest which is closest to the age of 35. This choice is made in order
to minimize the incidence of life-cycle earnings dynamics on our measure of economic
assortative mating (Haider and Solon 2006). This results in a sample of 7,966 couples. In
the main analysis, we also exclude couples in which earnings are zero for both partners.8
In the end, our analysis is based on sample of 7,864 couples. Appendix A provides general
descriptive statistics on our final sample.
2.2
Main variables
We examine two types of individual characteristics : earnings and measures of socioeconomic achievement. Appendix A provides detailed information about the construction
of variables.
Earnings Annual earnings are defined as the total wage and salaries earned in the previous year deflated by the consumer price index. For individuals out of salaried employment,
7
National quality reports about the EU-SILC survey are available here: http://ec.europa.eu/
eurostat/web/income-and-living-conditions/quality/national-quality-reports
8
This corresponds to 102 couples.
7
the value of annual earnings is equal to zero9 .
Full-time equivalent (FTE) earnings are defined as annual earnings/(number of months
worked full-time + 0.5 × number of months worked part-time) × 12. To compute FTE
earnings, we rely on the history of labor force participation reported in the survey. For
individuals out of salaried work, FTE earnings are missing, by construction.
For both earnings measures, we compute multi-year averages of individual earnings.
The average is computed over the full set of available yearly observations. The number of
years of observation in our sample varies between 1 and 8 years, with an average of 3.4
years.
Other socioeconomic variables Education- The first measure is the number of years of
education, equal to the school leaving age minus 6 years (i.e. minimum age for compulsory
education)10 . Our second variable is based on the ordered classification of the highest
degree completed.
Occupation- Our measure of occupation is based on the standard 6-levels French classification. In order to come close to an ordinal measure of occupation, we gather farmers
and unskilled manual workers. The SILC survey investigated individual socioeconomic
origin and gathered information on education and occupation of both parents of adult
respondents. Information is only available in 2005.
3
Descriptive measures of assortative mating
3.1
Education and occupation
We first analyze the extent of assortative mating in socio-economic achievement by estimating the partners correlation in occupation and education. Information is available for both
partners of the couple, as well as for their parents. For ordinal variables (occupation and
highest degree completed), the association is measured using two indicators : the Spearman
9
Given our use of panel data, the individuals with zero earnings should have never reported any salaried
activity. Some of these individuals may however report unemployment period and so potentially unemployment benefits. Taking into account these benefits (as a proxy for earnings) does not change our estimates
but it increases the measurement errors. In the end, we decided not to include them.
10
For some individuals, the number of years of education appears noisy. Furthermore, although highest
degree is reported for all individuals in the sample, number of years of education is missing for 9% of the
sample. For this reason we estimate the correlation in predicted number of years of education, where the
prediction is based on a regression of number of years of education on degree dummies interacted with
gender and a fourth degree polynomial function of birth cohorts.
8
correlation coefficient measures the statistical association in the distributional ranks of two
variables; the polychoric correlation assumes that the discrete variable that measures each
partner’s attainment (degree, occupation) is determined by a latent variable, following a
multinomial model. The polychoric correlation is defined as the linear Pearson correlation
coefficient for the latent variables of the two partners and is parametrically identified.11 For
the number of years of education, we report linear (Pearson) correlations and Spearman
rank correlations.
Occupation Table 1 provides our estimates of assortative mating for occupation and
education. Occupational correlations are given in panel A. The correlation in partners’
own occupation ranges between 0.453 and 0.531 (column 1), which appears high, though
in line with estimates found for other countries.
This can be compared to estimates of the correlation in social origin, as captured
by parental occupation. Columns 2 and 3 compare the correlations in own occupation
with the correlation in father’s occupation, on the sub-sample where father’s occupation
is reported. Columns 4 and 5 report the same analysis for mother’s occupation. On these
sub-samples, the correlation among partners in own occupation (columns 3 and 5) is very
similar to the whole sample (column 1). The correlation among partners in fathers’ or
mothers’ occupation is positive and around 0.3, which indicates positive assortative mating
by social origin. Note though that the correlation in parental occupation is lower than the
correlation in patners own occupation, which indicates that assortativeness depends more
on individual occupational attainment than on social origin. The correlation is higher for
fathers’ occupation (0.29-0.377) than for mothers’ (0.249-0.308). It is important to keep in
mind that the absence of information for a significant share of respondents’ mother (mainly
because of inactivity) makes the comparison difficult. The high level of assortative mating
and the difference between the partners and their parents are consistent with existing
evidence on French data (Bouchet-Valat 2014).
Education Panels B and C of Table 1 report statistical associations in education. Panel
B uses the highest degree completed. On the whole sample, we find positive correlations
between 0.559 and 0.593. The difference between the two measures of correlations (Spearman rank correlation vs. polychoric correlation) is small. These correlations appear higher
11
The idea of polychoric correlation dates back to Pearson (1900) and Ritchie-Scott (1918)
9
for education than for occupation. The correlation between partners is also higher for own
education than for social origin, as captured by parents’ education. However, compared
to panel A, the differences between own and parental characteristics appear smaller for
education than for social class.
Panel C provides correlation estimates for a continuous measure of education, the
number of years of education. The correlations are higher, around 0.62, but consistent
with those obtained for the correlation in highest degree completed.
Overall, our results indicate high levels of positive assortative mating in France. These
results are consistent with existing evidence on France (Goux and Maurin 2003, BouchetValat 2014). They can be compared with the results presented in Fernandez, Guner, and
Knowles (2005) for the correlation in education in a large set of countries. Our estimates
for France appear higher than the correlation reported for most European countries, with
the exceptions of Spain, Belgium and Italy. They are similar to those reported for the US
and lower than those found in most Latin American countries (around 0.8).
3.2
Earnings
Annual and FTE earnings To assess the extent of economic assortative mating, we
now examine the correlation between partners in annual and full-time equivalent (FTE)
earnings.
Results are presented in table 2. Column 1 reports correlations in annual earnings
based on all observations, including zeroes. The correlation between partners in annual
earnings is around 0.175. Column 2 focuses on dual-earner couples, in which both partners
report positive earnings. The correlation in this sample is significantly higher (0.31 for the
Pearson correlation). The gap in the estimated correlation between the two samples is likely
to be explained by non-participation in the labor force. When earnings are zero for one the
partners, it is predominantly female earnings. Assume first that labor force participation
of women is independent of male earnings. In this case one would expect the correlation
coefficient to fall when non-participants with zero earnings are taken into consideration.12
Whether the assumption of random participation constitutes a reasonable approximation
12
In fact, under random participation, the presence of zeroes would mechanically lead to a decrease in
the covariance of earnings among partners. Furthermore, the inclusion of zeroes would likely (although
not surely) increase the variance of earnings in each marginal distribution. These two effects would then
converge to decrease the correlation coefficient.
10
is of course open to discussion and we shall return to this issue below. But note, however,
that if female non-participation is more likely in couples with higher male earnings this
will further reinforce the fall in earnings correlation when including observations with zero
earnings.
In the last column of table 2, we examine the correlation in FTE earnings. This
allows to remove the correlation in labor supply decisions within the couple that affects
the correlation in annual earnings and focus on the correlation in potential earnings. As in
column 2, we focus on dual-earner couples. This results in a much higher correlation, up to
0.351 for the Pearson correlation. Compared to column 2, removing heterogeneity across
individuals in the number of months worked full and part-time increases the correlation
in earnings by about 13%. This increase indicates that the correlation within couples in
hours worked is lower than the correlation in hourly wage rate. It confirms, along the
intensive margin, our discussion, in the previous paragraph, of the incidence of of labor
supply decisions. We address this issue more carefully in section 4.
One may suspect that part of the correlation in earnings arises from life-cycle effects,
through the correlation in birth cohort within couples. For all columns, we thus estimate
the correlation in earnings after netting out cohort effects.13 Result indicate a modest fall
in the estimated correlation. The Pearson coefficient falls by 3.5 to 4.5%. The effect on
the Spearman correlation is even smaller.
Two conclusions can be drawn from table 2. First, results indicates that assortativeness
in earnings is high in France compared to other countries. On a similar sample from the US
population, Schwartz (2010) estimates a correlation of 0.12 for all couples (including couples
in which one of the partners is out of the labor force) and a correlation slightly higher than
0.2 for dual earner couples. Our estimates are 45% and 55% higher, respectively, in France.
Second, the table also indicates that labor supply decisions (along both the extensive and
the intensive margins) attenuate the correlations of potential earnings. In other words,
marital sorting according to potential labor earnings is high but the labor supply decisions
pertaining to labor force participation and part-time work tend to dampen the correlation
in partners’ earnings.
13
This is achieved by first regressing earnings on a quartic function of birth cohort and taking residuals.
11
Contribution of education and social origin As noted in the introduction, most
papers focus on assortativeness by education or social origin. Both variables capture dimensions along which marital sorting should obviously occur, given the interplay between
socialisation processes and mating decisions. However, it is also relevant, for understanding the socio-economic determinant of mating decisions, to investigate whether sorting also
occurs once individual social characteristics have been taken into account. Actually, one
may object to the analysis of assortativeness by earnings that it merely reflects the correlation in partners’ education and social origin. To address this issue, we examine whether
earnings remain correlated, once they have been purged from the effect of education and
social origin.
Table 3 presents estimates for correlations based on earnings residuals after controlling
for education, social origin or both variables14 . First, labor earnings remain positively
correlated, even after controlling for individual educational attainment and social origin.
Controlling for education alone (Panel B) decreases the correlation by about 35%. Controlling for social origin (Panel C) has a smaller impact on the correlation that falls by 20
to 25%. Last, comparing panels B and D indicates that once education is accounted for,
further conditioning on social origin leaves the correlation in earnings almost unchanged.
As a conclusion, even if assortativeness in terms of social background and of education is
high, as discussed in section 3.1, there is still significant sorting along other dimensions not
captured by these variables.
Multi-year average earnings A potential challenge to the measurement of earnings
correlation is the incidence of measurement errors and transitory income components. Under measurement error, the correlation in annual measures of earnings might underestimate
the correlation among partners in permanent earnings. The degree of underestimation will
depend on the variance of measurement errors and the correlation among partners in transitory earnings components, compared to permanent components.
One way of moderating the incidence of these biases is to use average earnings, computed over multiple years of observations. This is undertaken in table 4. For each individual
and each measure or earnings (annual and full-time equivalent), we compute average earnings using all available time observations. Since the number of observations over which
individuals are observed varies across individuals, these averages are computed over vari14
We restrict the sample to couples with valid information on education and social origin.
12
able horizons. We consider two sub-samples. In panel A, we estimate earnings correlations
on the sample of couples observed during at least 3 years; in panel B, we focus on couples
who are observed during at least 5 years.
Using multiple-year averages has a limited effect on our measure of the correlation
in annual earnings. The linear correlation coefficient increases by 13% when averaging
annual earnings over at least three-years. Using average earnings has a similar effect on
the correlation in full-time equivalent earnings that increases by about 17% to reach a high
value of 0.466. Again, estimating correlations on earnings residuals net of cohort effects
barely changes the results. When averaging earnings over a period of at least five years,
the estimated correlations reach an even higher value : 0.416 for annual earnings and 0.49
for FTE earnings.
While averaging earnings affects our measure of assortativeness in the expected direction, the size of the effect is lower than expected a priori. In a related context, intergenerational elasticity estimates indicate that using current earnings in place of permanent
earnings leads to underestimate the intergenerational association in earnings by about one
third. This is consistent with available evidence indicating, first, that measurement errors
in annual earnings account for 10 to 15% of the variance in earnings (e.g. Duncan and
Hill 1989, Hagneré and Lefranc 2006) and, second, that transitory components account for
roughly one fourth of total earnings variation (Moffitt and Gottschalk 2011). However, in
our case, earnings data are derived from administrative data after 2007. Additionally, as
discussed in Appendix A, winsorizing the extreme one percent of the distribution should
also reduce the incidence of measurement error. Furthermore, contrary to what occurs for
intergenerational estimates, transitory earnings and not just permanent components are
likely to be correlated within couples, to the extent that they relate to factors such as local
labor market conditions or other household level shocks. Ostrovsky (2012) reports supportive evidence. In our case, given limited sample size and time-series depth, we cannot
directly investigate this issue.
In the end, using average earnings reinforces the view that earnings are highly correlated
within couples in France.
Non-linearities in assortative mating We now examine the extent of non-linearities
in the association in earnings among couples. Figures 1 and 2 provide evidence that the sta-
13
tistical association in earnings vary along the earnings distribution. These figures present
the contour plot of the bivariate earnings distribution among couples. The first panel gives
the contour plot of earnings in level. For annual earnings, there seems to be little correlation
in the lower tail of the distribution and a stronger one at the top. This is confirmed by the
second panel, which represents the joint distribution of the ranks. Under the assumption
of joint normality (or joint log normality) of the earnings distribution, this contour plot
should be symmetric around the middle point of the box and should display two equalsized peaks at the bottom and at the top of the distribution. In our case, the distribution
of ranks is bimodal, but displays a much higher peak at high quantiles, indicating that
earnings correlation is larger at the top of the earnings distribution.
4
4.1
Sample selection and assortative mating
Model
The results of the previous section indicate that the correlation in labor earnings is influenced by labor supply decisions, along both the intensive and extensive margins. Unfortunately, none of the above estimations provides a satisfactory measure of the extent of the
partners correlation in both economic resources and potential earnings. On the one hand,
using all observations, including those with zero earnings amounts to ignore that people out
of the labor force might produce economic resources domestically or enjoy higher welfare
due to increased leisure consumption. On the other hand, the simple correlation in fulltime equivalent earnings computed from the sample of dual-earner couples ignores possible
sample selection into participation. Since participation decisions depend on the earnings
of both partners, selection is likely to be non-random. In this case, the correlation in fulltime equivalent would provide a biased estimate of the correlation in potential earnings,
although the direction of the bias is a priori unknown.
Unbiased estimates of the correlation in potential earnings can be derived from a wage
regression model that explicitly accounts for sample selection. Let ws denote the earnings
of partner s, with s = m for the male partner and s = f for the female partner. We assume
14
that (wm , wf ) follows a bivariate log-normal distribution :
wm
wf
→ ln N (µ, Σ)
with
µ=
µm
µf
and Σ =
2
σm
ρσm σf
ρσm σf
σf2
Under the assumption of bivariate log-normal distribution the relationship between
male and female earnings can be written as :
ln wf = β0 + β ln wm + ε
(1)
where the regression slope satisfies β = ρσf /σm and is thus equal to the correlation coefficient of the variables in logarithm, rescaled by the standard errors ratio of male and
female.
Assume first that wm is always observed but that wf is only observed for women in
the labor force15 . In the likely case where participation decisions depend on both partners’
potential earnings, the sample of dual earners is no longer representative of the entire
population. In this case, the partners’ correlation cannot be directly assessed, based on
observed earnings alone. Likewise, the distribution of wf will be censored by participation
decisions and the estimation of the standard errors of female earnings from observed data
will be biased. However, equation 1 can be consistently estimated using Heckman’s sample
selection correction model. This yields consistent estimates of both β and σε .
The estimates obtained from the sample selection regression model can be combined
with estimates of σm to obtain an estimate of the within-couple correlation in log-earnings,
ρ. It is given by:
σm
ρ = βp
2
σε2 + β 2 σm
We use this approach to estimate the partners correlation in residual earnings, i.e.
net of age and time effects. The participation equation includes controls for the number
of children in the household, household capital income, a quadratic function of the annual
labor earnings of the husband, an indicator of whether the husband holds a long-term labor
contract and a quadratic form in the age of both partners.
In principle, estimates of this model could also be biased if there is non-random selection
15
Table A.1 shows that the share of men reporting positive earnings equals 94% while this share equals
77% for women.
15
in the observability of male earnings, although this is much more rarely the case in our
sample. We investigate this issue in appendix C where we estimate a double selection
model. Results indicate that selection based on the observability of male earnings can be
ignored in the analysis of assortativeness within couples.
4.2
Results
Estimation results are given in Tables 5 and 6. Table 5 provides estimates of the regression coefficient, correlation coefficient, both in logarithm form, and earnings standarddeviations. Given the pattern of female labor participation and the incidence of part-time
work among female, the assumption of joint log-normal distribution, discussed in the previous section, does not appear relevant for annual earnings. Hence, we concentrate here on
FTE earnings.
Estimates in panel A ignore sample selection issues. The results found here are very
similar to those reported earlier: The estimate of the correlation in log-FTE earnings is
.326, compared to .337 for the correlation in levels, once cohort effects have been removed
(Table 2, panel B). Estimates in panel B are obtained using Heckman’s sample selection
model. Ignoring sample selection issues leads to slightly overestimate the extent of the
earnings correlation. Specifically, the correlation falls from 0.326 to 0.31 and from 0.361 to
0.353 in the case of multi-year average FTE earnings. This fall in the estimated correlation
arises from two effects : first, a fall in the partners earnings elasticity (β), once selection
is taken into account; second, a rise in the dispersion of female earnings, once we account
for the fact that the distribution of female earnings in truncated owing to the participation
decision. This suggests that sample selection into employment has only a moderate impact
on the estimated earnings correlation.
Table 6 gives the estimates of the Heckman sample selection model. Analyzing the
results of the selection equation allows a better understanding of the process that determines
whether female partners work for pay. ρres indicates the correlation coefficient of the
error terms of the selection and wage equations. For all specifications, this coefficient is
negative. This indicates that women with a positive earnings residual, conditional on their
partner’s earnings have a lower probability of working for pay. In other terms, for female
partners, “undermarriage” (i.e. women with high potential earnings conditional on their
partner’s earnings) is associated with lower participation and “over marriage” is associated
16
with higher participation. This result illustrates that the idiosyncratic disutility of work,
captured by labor supply unobserved determinants, are not independent of the idiosyncratic
potential earnings of the mate.
Table 6 also allows assessing the relationship between the male earnings and female
2 indicates a hump-shaped relabor market participation. The coefficients of wm and wm
lationship. Table 7 provides additional evidence on female labor market characteristics
conditional on the male FTE earnings. The female employment rate rises along the male
earnings distribution. After a sharp increase between the first and second deciles (D1
vs. D2), the employment rate increases steadily up to the sixth decile and plateaus to
about 80% until the ninth decile but significantly falls in the top decile. The lower female
employment rates at the tails of the distribution of male earnings mostly reflect a low participation rate, rather than a higher risk of unemployment (columns 2 and 3). As previously
discussed, under random participation to the labor market, we would expect that excluding
individuals with zero earnings would increase the observed correlation in earnings. This
is partly reinforced by the hump-shaped pattern in labor-force participation observed in
column (1). Second, the number of months worked (conditional on being in employment)
follows a similar hump-shaped pattern, although the variation across male earnings deciles
is rather limited. In sum, there seems to be more variation, across male deciles, in female
labor supply along the extensive margin than along the intensive margin. Third, although,
overall, female earnings increase with male earnings, the relationship is relatively flat in
the bottom half of the distribution (D1 to D4). This seems particularly true for FTE
earnings. However, the gradient in female earnings conditional on male earnings, at the
top of the distribution seems steeper for FTE earnings than for annual earnings. Hence,
the increase in the observed correlation in earnings when using FTE earnings rather than
annual earnings seems largely driven by a rise in the statistical association between male
and female earnings at the top of the earnings distribution.
17
5
The contribution of assortative mating to earnings inequality among households
5.1
Methods
Assessing the contribution of assortative mating to earnings inequality among households
requires comparing the observed distribution of earnings to a counterfactual distribution
that would prevail under alternative mating patterns. In line with several recent papers,
the counterfactual mating pattern we consider corresponds to the hypothesis of random
matching.16
As discussed in Harmenberg (2014), two main methods have been used in the literature
to build a counterfactual earnings distribution, under the assumption of random mating.
The first approach is followed by Hryshko, Juhn, and McCue (2014) and to some extent
Burtless (1999). It amounts to take observed labor earnings of male and female as a
fixed individual characteristic and to randomly match individuals into simulated couples.
Household earnings are computed as the sum of the labor earnings of both partners in the
simulated couples. In this case, the counterfactual distribution is simply a convolution of
the marginal earnings distribution of female and male partners observed in the population.
Following Harmenberg (2014), we refer to this method as addition randomization. The
major limitation of this approach is to assume that individual labor supply decisions are
exogenous with respect to match characteristics.
An alternative approach is implemented in Greenwood, Guner, Kocharkov, and Santos
(2014) and Eika, Mogstad, and Zafar (2017). In this approach individuals are characterized
by some observable characteristics Z, such as education. The total earnings of a household
are determined by the characteristics of both partners, Zm and Zf . For each combination of
partners’ characteristics, a (conditional) household earnings distribution can be computed.
Randomization amounts to create pseudo-couples in which the characteristics Z of both
partners are randomly drawn from the observed distributions of Z characteristics (among
male and female partners) in the population. Once the characteristics of both partners of
the pseudo-couple are defined, household earnings are randomly drawn from the observed
distribution of household earnings, conditional on partners’ characteristics. Hence, the
16
Several papers focusing on the effect of changes in assortative mating on the income distribution (e.g
Karoly and Burtless 1995, Burtless 1999) rely on a different counterfactual, usually the mating pattern
observed in a reference year.
18
counterfactual distribution is a mixing of observed conditional earnings distribution, where
the mixing weights are defined by the random mating hypothesis. We refer to this approach
as imputation randomization.
To illustrate the imputation approach, assume that the population of individuals is
split equally into two groups, regardless of gender : high education individuals, denoted
by H and low education denoted by L. Based on education, we distinguish four types
of couples : HH, HL, LH, and LL. For each type, we observe the cumulative earnings
distribution function among couples with this type : FHH (y), FHL (y), ... Let pHH , pHL ,
pLH , pLL denote the weight of each type in the population of couples. The actual CDF of
the distribution of earnings among couples is equal to : F (y) = pHH FHH (y)+pHL FHL (y)+
pLH FLH (y) + pLL FLL (y). If the characteristics of partners were drawn randomly in the
population, the share of each type among couples would be equal to
1
4
(again assuming
equal shares of H and L individuals among males and females). Hence the counterfactual
distribution under imputation randomization is, in this case, given by F̃ (y) = 41 {FHH (y) +
FHL (y) + FLH (y) + FLL (y)}.
The advantage of the imputation randomization, compared to the addition randomization approach, is to allow for endogenous labor supply responses, but only as long as
they depend on the conditioning variables Z.17 In other words this amounts to rule out
the possibility that household labor supply decisions and earnings be also determined by
partners’ unobserved characteristics whose distribution may differ across observed couples
with different combinations of Z. The results in section 4 suggest that this assumption may
fail to hold, as labor supply unobserved determinants seem to depend on the productivity
characteristics of the match. It is also worth stressing that, according to the results in
table 3, the correlation in earnings cannot be fully accounted for by the correlation in the
conditioning variables (education).
Both approaches above attempt to quantify the effect of assortative mating on inequality of realized household annual earnings. We also implement a third approach that
allows assessing the effect of assortativeness on inequality of household potential earnings,
defined as the earnings the couple would earn if both partners worked full-time. Contrary
17
The procedure developed by Pestel (2017) may be linked to the imputation approach. It amounts to
randomize individuals with different wage rates into counterfactual couples and to simulate labor supply
decision based on a household labor supply model. Wage rates are, however, predicted on the basis of sociodemographic characteristics such as education. The model thus fails to account for assortative mating along
unobserved earnings determinants.
19
to realized earnings, which are partly determined by joint labor supply decisions within
the household, potential earnings can largely be considered as an exogenous individual
characteristic, with respect to couple composition.18
The contribution of assortative mating to inequality across couples in household potential earnings can be assessed using three approaches. We can first implement the addition
and imputation randomization approaches to the distribution of FTE earnings, on the sample where both partners work. This raises the same concerns as previously discussed. The
third approach is to use the model of section 4 in order to parametrically identify the joint
distribution of partners’ potential earnings among observed couples. Under the assumption
of joint-log normality, this distribution is characterized by three parameters: the variance
of earnings in the marginal earnings distribution of female and male and the covariance
of earnings within the couple. The estimated parameters can be used to compute the degree of inequality in the distribution of household potential earnings, although potential
earnings are a latent, unobserved variable for some couples where one of the partners is
out of employment. Furthermore, once the parameters of this joint distribution have been
estimated, it is easy to simulate the distribution of household potential earnings under the
assumption that the correlation of partners’ potential earnings is zero, holding constant
the characteristics of the marginal distributions.
Regardless of the specific method used to construct the counterfactual earnings distribution, an additional issue arises regarding whether the randomization process should
operate on the overall population or within age groups. As previously discussed, part of
the correlation of economic outcomes within couples is driven by the fact that partners
are homogenous in terms of birth cohort. This cohort-wise homogamy would likely survive
even if partner’s choice was independent of individual social and economic characteristics.
For this reason, one may suggest that the randomization process used to build the counterfactual should occur conditional on the age of partners. In the rest of the analysis, we
follow this assumption and only allow rematching to occur conditional on the age of both
partners.
Last, one should also mention that none of the three above approaches takes into
consideration the changes in the distribution of earnings and wage rates. Such changes
18
This is true, at least, in the short term. In the long run, due to the accumulation of experience and
seniority, potential earnings also depend on past labor supply decisions. We do not account for this source
of endogeneity here.
20
could indeed result from general equilibrium effects driven by changes in the composition
of households and in their labor supply decisions. They are however rarely taken into
consideration in such counterfactual decompositions of inequality.
The three randomization algorithms are described in Appendix B.
5.2
Results
Our estimates of the effect of assortative mating on earnings inequality are given in table
8. For the observed and simulated earnings distributions we compute standard inequality
indices (Gini, Theil, Atkinson and P90/P10). We also report the variation of the inequality
indices between the actual distribution and the counterfactual distribution, which indicates
the inequality reduction obtained by randomizing mating patterns among couples.
Annual earnings Panel A reports the results for addition randomization. Inequality
in the actual distribution, for instance the Gini coefficient of 0.27, is slightly lower than
the degree of inequality in the overall distribution of earnings in France. This reflects the
greater homogeneity of our sample, compared to the overall population, induced by our
sample selection rules.19 The equalizing effect of randomizing individual annual earnings
across couples, conditional on age, appears relatively modest. The Gini index falls by 8.5%.
The effect on the other inequality measures is larger : the Theil and Atkinson indices fall by
about 17-18%. Of course one of the difficulties of this approach is that it fails to take into
account the labor supply responses that would occur if individuals were randomized into
less homogenous couples. These labor supply responses would be likely to occur, especially
in the case of female. However, the consequence of these labor supply adjustments for
overall earnings inequality is a priori unclear.
Panel B provides actual and counterfactual inequality measures for the imputation
randomization procedure. The effect of randomizing educational attainment across couples
(conditional on age) is smaller than in Panel A. The Gini falls by 2.8% is in line with the
results reported in Eika, Mogstad, and Zafar (2017), Greenwood, Guner, Kocharkov, and
Santos (2014) and Harmenberg (2014) who also report a modest contribution of assortative
mating to inequality between couples. However, the effect on the other inequality indices is
significantly larger, especially for the Atkinson(2), which falls by about 8.6%. Though one
19
Excluding single-headed households will, in particular, drive down inequality measures.
21
of the advantages of the imputation randomization approach is to allow for labor supply
responses, one obvious limitation of this approach is to rule out selection on unobservable
characteristics and to assume that heterogamous couples are a good counterfactual for the
behavior of individuals observed in homogamous couples if these individuals were rematched
with more heterogeneous partners. Unfortunately, it is hard to guess how selection on
unobservable characteristics would bias the counterfactual experiment.
FTE earnings Panels A, B and C also provide evaluations of the effect of assortative
mating on inequality in FTE earnings. First, one should stress that using FTE earnings
as the variable of interest reduces inequality in the distribution, by reducing heterogeneity
across individuals arising from differences in labor supply. This explains the relatively low
observed value of the inequality measures.
Overall the results indicate a larger contribution of assortative mating to potential
earnings inequality than for annual earnings. The simulations conducted under addition
randomization (Panel A), predict a sizable fall in inequality as a result of random rematching. The Gini coefficient would fall by 10.1% and the Theil index by 21.3%. Unlike the
results obtained for annual earnings, imputation randomization also indicates a sizable
effect of assortative mating on FTE earnings inequality. For instance, imputation randomization predicts a fall in the Gini of 8.3% (against only 2.8% for annual earnings) and a fall
in the Theil index of 17.4%. Controlling for sample selection in Panel C provides consistent
results on the disequalizing effect of mating patterns. Under random matching, the Gini
coefficient would fall by 8.7% and the Theil index would fall by about 16.6%.
In summary, the three approaches to randomization produce similar and consistent
results in the case of FTE earnings. They all point to a sizable contribution of assortative
mating to earnings inequality. The effect is also much higher than the one observed for
annual earnings. Two conclusions can be drawn from these results. First, the effect of assortative mating on annual earnings inequality seems to be partially mitigated by endogenous
labor supply decision. Second, the small contribution of assortative mating to annual earnings inequality may mask a greater contribution to overall inequality across households.
In this respect, FTE earnings provide a broader measure of the resources available to the
household and might be more relevant to assess the consequences of mating decisions on
inequality.
22
6
Concluding comments
In this paper, we evaluated the extent of assortative mating in France and its contribution
to inequality between couples. Our estimates reveal a large statistical association in socioeconomic characteristics among partners. The correlation coefficient for years of education
is high, around 0.6. Similar results are found for occupation. For annual earnings, the
correlation appears much weaker, around 0.17, when computed on all individuals, including those with zero earnings. Although this value seems low, especially when compared
to the correlation in other socio-economic characteristics, one should emphasize that it is
markedly higher than the one found for other developed countries, in particular the US.
The correlation of full-time equivalent earnings, computed on the sample of couples in
which both partners are salaried, is also markedly higher than for annual earnings: this
correlation is around 0.35 for yearly measures of FTE earnings and raises up to 0.49 when
using multi-year averages. All in all, this points to a fairly large degree of assortative
mating among French couples.
This high degree of homogamy is consistent with the picture of a highly stratified
French society. For instance, Lefranc and Trannoy (2005) and Lefranc (2011) report that
the degree of intergenerational earnings persistence in France is relatively high compared to
other developed economies. Lecavelier and Lefranc (2015) estimates statistical association
in education and earnings among siblings. Their findings indicate a high correlation in
socio-economic outcomes among siblings. Interestingly, they report values of the intrasiblings correlation in education and earnings that are very similar to the value of the
within-couple correlations found here. This implies that the degree homogeneity within
couples is similar to the degree of homogeneity within family among siblings. In other
words, from the perspective of inequality among couples, patterns of assortative mating
are equivalent to a process in which individuals would randomly select their mates... from
their family of origin. Chadwick and Solon (2002) and Ermisch, Francesconi, and Siedler
(2006) report consistent evidence.
Economic assortative mating might not simply result from the effect of social stratification but also arises from economic determinants. Of course, economic assortative mating
is expected to occur as a result of marital sorting along non-economic dimensions such
as social origin or educational choice. However, our results indicate that partners’ earnings remain significantly correlated, even after controlling for educational choice or family
23
background. This is consistent with the view that economic considerations might be an
important factor in determining partner’s choice. Fremeaux (2014) provides similar evidence.
Our results also allow assessing the contribution of assortative mating to earnings inequality among couples. Several papers have recently addressed this issue but no clear
picture has emerged regarding the disequalizing effect of homogamy. This lack of consensus partly reflects the use of different methodologies for assessing the counterfactual
distribution of earnings that would prevail under random mating. As a matter of fact,
current approaches fail to fully account for the endogeneity of labor supply decisions and
for assortative mating along unobserved individual characteristics. Our results indicate
that assortative mating has a sizable contribution to earnings inequality. Specifically, the
Gini coefficient in earnings would fall by 2 points under random mating. This fall is of
the same order of magnitude as the reduction in inequality that arises from income tax
redistribution in France.20 These results are based an alternative approach that focuses
on couples’ potential earnings. The potential earnings are defined as the earnings a couple
would receive if both partners worked full-time, given their idiosyncratic market wage rate,
and are measured by the sum of the full-time equivalent earnings of both partners. Our
approach also accounts for assortativeness in unobservable earnings determinants. We show
that assortative mating tends to increase inequality among couples, compared to random
mating. For annual earnings, the effect is moderate and accounts for 4 to 10% of measured
inequality. The effect of assortative mating is however much larger when focusing on couples’ potential earnings and amounts to 10 to 20% for observed inequality. The effect of
assortative mating is found to be larger for inequality indices more sensitive to the tails of
the distribution.
The discrepancy between the two estimates suggests that labor supply decisions tend
to dampen the effect of marital sorting on inequality in labor earnings across couples and
partly masks wider inequality in household resources and welfare. Labor supply decisions
and their relationship with marital sorting should be investigated further. The extent of
marital sorting along preferences for work and employability should be evaluated. Future
research should also examine the interplay between assortative mating and fiscal policy.
This issue is seldom addressed with the exception of Pestel (2017). More specifically, the
20
See Immervoll, Levy, Lietz, Mantovani, O’Donoghue, Sutherland, and Verbist (2005).
24
design of couples’ income taxation strongly influences the partners’ labor supply decisions.
While individual taxation encourages labor market participation, joint taxation encourages
specialisation within the household since the marginal tax rate of the secondary earner
depends on that of the primary earner (Crossley and Jeon 2007). A majority of rich
countries has implemented an individual income tax scheme (Care 2014). However, in
France, taxation occurs at the household level. Given the observed hump-shaped female
labor market participation, one could expect that the effect of individual taxation on female
labor supply should increase the contribution of assortative mating to inequality. Future
research should address this issue.
25
References
Ahmad, N., and S.-H. Koh (2011): “Incorporating Estimates of Household Production
of Non-Market Services into International Comparisons of Material Well-Being,” OECD
Statistics Working Papers 2011/7, OECD Publishing.
Arrondel, L., and N. Fremeaux (2016): “For richer, for poorer: assortative mating
and savings preferences,” Economica.
Becker, G., and N. Tomes (1979): “An Equilibrium Theory of the Distribution of
Income and Intergenerational Mobility,” Journal of Political Economy, 87(6), 1153–1189.
Becker, G. S. (1973): “A Theory of Marriage: Part I,” Journal of Political Economy,
81(4), 813–46.
Black, S., and P. J. Devereux (2011): Recent Developments in Intergenerational Mobility. Elsevier.
Blossfeld, H.-P., and Timm (2003): Who Marries Whom?: Educational Systems as
Marriage Markets in Modern Societies. Springer.
Bollinger, C. R., and A. Chandra (2005): “Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data,” Journal of Labor Economics, 23(2), 235–258.
Bouchet-Valat, M. (2014): “Les évolutions de l’homogamie de diplome, de classe et
d’origine sociales en France (1969-2011) : ouverture d’ensemble, repli des élites,” Revue
française de sociologie, 55(3), 459–505.
Burtless, G. (1999): “Effects of growing wage disparities and changing family composition
on the U.S. income distribution,” European Economic Review, 43, 853–865.
Cancian, M., and D. Reed (1998): “Assessing the Effects of Wives’ Earnings on Family
Income Inequality,” The Review of Economics and Statistics, 80(1), 73–79.
Care (2014): “The taxation of families - International comparisons 2012,” Care research
paper.
Chadwick, L., and G. Solon (2002): “Intergenerational income mobility among daughters,” American Economic Review, 92(1), 335–344.
Crossley, T. F., and S.-H. Jeon (2007): “Joint Taxation and the Labour Supply of
Married Women: Evidence from the Canadian Tax Reform of 1988,” Fiscal Studies,
(28), 343–365.
Dohmen, T., A. Falk, D. Huffman, and U. Sunde (2012): “The Intergenerational
Transmission of Risk and Trust Attitudes,” Review of Economic Studies, 79(2), 645–677.
Duncan, G. J., and D. H. Hill (1989): “Assessing the Quality of Household Panel Data:
The Case of the Panel Study of Income Dynamics,” Journal of Business and Economic
Statistics, 7(4), 441–52.
Eckstein, Z., and O. Lifshitz (2011): “Dynamic Female Labor Supply,” Econometrica,
79(6), 1675–1726.
26
Eika, L., M. Mogstad, and B. Zafar (2017): “Educational Assortative Mating and
Household Income Inequality,” Federal Reserve Bank of New-York Staff Reports 692,
Federal Reserve Bank of New-York.
Ermisch, J., M. Francesconi, and T. Siedler (2006): “Intergenerational Mobility
and Marital Sorting,” Economic Journal, 116(513), 659–679.
Fernandez, R., N. Guner, and J. Knowles (2005): “Love and Money: A Theoretical and Empirical Analysis of Household Sorting and Inequality,” Quarterly Journal of
Economics, 120(1), 273–344.
Frazis, H., and J. Stewart (2011): “How does household production affect measured
income inequality?,” Journal of Population Economics, 24(1), 3–22.
Fremeaux, N. (2014): “The Role of Inheritance and Labour Income in Marital Choices,”
Population-E, 69(4), 495–530.
Goux, D., and E. Maurin (2003): “Who Marries Whom in France. An Analysis of the
Cohorts born between 1934 and 1978,” in Who Marries Whom?, ed. by H. Blossfeld, and
Y. Shavit, chap. 4. Oxford University Press, Oxford.
Greenwood, J., N. Guner, G. Kocharkov, and C. Santos (2014): “Corrigendum
to Marry Your Like: Assortative Mating and Income Inequality,” American Economic
Review - Papers and Proceedings, 104(5), 348–353.
Gronau, R. (1977): “Leisure, Home Production, and Work-The Theory of the Allocation
of Time Revisited,” Journal of Political Economy, 85(6), 1099–1123.
Hagneré, C., and A. Lefranc (2006): “Etendue et conséquences des erreurs de mesure
dans les données individuelles d’enquête: une évaluation à partir des données appariées
des enquêtes Emploi et Revenus Fiscaux,” Economie et Prévision, 174(3), 131–154.
Haider, S., and G. Solon (2006): “Life-Cycle Variation in the Association between
Current and Lifetime Earnings,” American Economic Review, 96(4), 1308–1320.
Ham, J. C. (1982): “Estimation of a Labour Supply Model with Censoring Due to Unemployment and Underemployment,” The Review of Economic Studies, 49(3), 335–354.
Harmenberg, K. (2014): “A note: the effect of assortative mating on income inequality,”
Mimeo.
Heckman, J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica,
47(1), 153–61.
House, C., J. Laitner, and D. Stolyarov (2008): “Valuing Lost Home Production Of
Dual Earner Couples,” International Economic Review, 49(2), 701–736.
Hryshko, D., C. Juhn, and K. McCue (2014): “Trends in Earnings Inequality and
Earnings Instability among U.S. Couples: How Important Is Assortative Matching?,”
IZA Discussion Papers 8729, Institute for the Study of Labor (IZA).
Immervoll, H., H. Levy, C. Lietz, D. Mantovani, C. O’Donoghue, H. Sutherland, and G. Verbist (2005): “Household Incomes and Redistribution in the European
Union: Quantifying the Equalising Properties of Taxes and Benefits,” IZA Discussion
Papers 1824, Institute for the Study of Labor (IZA).
27
Kalmijn, M. (1991): “Status Homogamy in the United States,” American Journal of
Sociology, 97, 496–523.
Karoly, L. A., and G. Burtless (1995): “Demographic Change, Rising Earnings Inequality, and the Distribution of Personal Well-Being, 1959-1989,” Demography, 32(3),
379–415.
Kimball, M., C. R. Sahm, and M. D. Shapiro (2009): “Risk Preferences in the PSID:
Individual Imputations and Family Covariation,” American Economic Review - Papers
and Proceedings, 99(2), 363–368.
Lecavelier, C., and A. Lefranc (2015): “Siblings correlations in socio-economic outcomes in France,” mimeo thema.
Lefranc, A. (2011): “Educational expansion, earnings compression and changes in intergenerational economic mobility : Evidence from French cohorts, 1931-1976,” THEMA
Working Papers 2011-11, THEMA (THéorie Economique, Modélisation et Applications),
Université de Cergy-Pontoise.
Lefranc, A., N. Pistolesi, and A. Trannoy (2009): “Equality of opportunity and
luck: Definitions and testable conditions, with an application to income in France (19792000),” Journal of Public Economics, 93(11-12), 1189–1207.
Lefranc, A., and A. Trannoy (2005): “Intergenerational earnings mobility in France: Is
France more mobile than the U.S.?,” Annales d’Economie et de Statistique, (78), 57–77.
Lise, J., and S. Seitz (2011): “Consumption Inequality and Intra-Household Allocations,”
Review of Economic Studies, 78(1), 328–355.
Mare, R. D. (1991): “Five Decades of Educational Assortative Mating,” American Sociological Review, 56(1), 15–32.
Moffitt, R., and P. Gottschalk (2011): “Trends in the covariance structure of earnings in the U.S.: 1969-1987,” Journal of Economic Inequality, 9(3), 439–459.
Nakosteen, R. A., O. Westerlund, and M. A. Zimmer (2004): “Marital Matching
and Earnings: Evidence from the Unmarried Population in Sweden,” Journal of Human
Resources, 39(4), 1033–1044.
Ostrovsky, Y. (2012): “The correlation of spouses’ permanent and transitory earnings
and family earnings inequality in Canada,” Labour Economics, 19(5), 756–768.
Pearson, K. (1900): “Mathematical Contributions to the Theory of Evolution. VII. On the
Correlation of Characters not Quantitatively Measurable,” Philosophical Transactions of
the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical
Character, 195, pp. 1–47+405.
Pestel, N. (2017): “Marital sorting, inequality and the role of female labor supply: Evidence from East and West Germany,” Economica, 84(333), 104–127.
Ritchie-Scott, A. (1918): “The correlation coefficient of a polychoric table,” Biometrika,
12(1/2), 93–133.
Roy, D. (2012): “Le travail domestique : 60 milliards d’heures en 2010,” INSEE Premiere,
1423.
28
Schwartz, C. R. (2010): “Earnings Inequality and the Changing Association between
Spouses’ Earnings,” American Journal of Sociology, 115(5), 1524–1557.
Schwartz, C. R., and R. D. Mare (2005): “Trends in Educational Assortative Marriage
from 1940 to 2003,” Demography, 42(4), 621–646.
Solon, G. (1992): “Intergenerational Income Mobility in the United States,” American
Economic Review, 82(3), 393–408.
Uunk, W. J. G., H. B. G. Ganzeboom, and P. Róbert (1996): “Bivariate and Multivariate Scaled Association Models. An Application to Homogamy of Social Origin and
Education in Hungary between 1930 and 1979,” Quality & Quantity, 30, 323–343.
Zhang, J., and P.-W. Liu (2003): “Testing Becker’s Prediction on Assortative Mating
on Spouses’ Wages,” The Journal of Human Resources, 38(1), pp. 99–110.
Zimmerman, D. (1992): “Regression Toward Mediocrity in Economic Stature,” American
Economic Review, 82(1), 409–429.
29
Table 1: Correlations - occupation and education
(1)
(2)
(3)
(4)
(5)
mother’s
occ.
own
occ.
A - Occupation
own
occ.
Spearman
Polychoric
Obs.
father’s
occ.
own
occ.
.453
.29
.456
.249
.468
[.434,.471]
[.254,.325]
[.424,.486]
[.203,.294]
[.43,.505]
.531
.377
.526
.308
.542
(.01)
(.021)
(.017)
(.026)
(.021)
6928
2559
2559
1635
1635
mother’s
degree
own
degree
B - Highest degree
own
degree
Spearman
Polychoric
Obs.
father’s
degree
own
degree
.559
.437
.58
.401
.571
[.543,.574]
[.403,.47]
[.551,.607]
[.368,.433]
[.545,.597]
.593
.506
.613
.476
.606
(.01)
(.021)
(.017)
(.026)
(.021)
7864
2202
2202
2571
2571
C - Years of education
years
Pearson
.62
[.606,.633]
Spearman
.624
[.611,.638]
Obs.
7864
Note: 95% confidence interval in square brackets; standard-errors in parenthesis. Estimates in panels A
and B, columns 2 to 5, are restricted to the sample of couples for which information on own and parental
occupation (resp. degree) is available.
30
Table 2: Correlations - labor earnings
(1)
w0
(2)
w
(3)
wF T E
A- Gross correlations
Pearson
Spearman
.175
.31
.351
[.153,.196]
[.286,.332]
[.328,.373]
.179
.269
.316
[.157,.2]
[.245,.292]
[.293,.338]
B- Net of Cohort effects
Pearson
Spearman
Obs.
.169
.296
.337
[.147,.19]
[.272,.319]
[.314,.359]
.175
.261
.318
[.153,.196]
[.238,.285]
[.295,.34]
7864
5983
5983
Note: w0 : annual labor earnings, including zeroes; w : annual labor earnings, excluding zeroes; wF T E
full-time equivalent annual labor earnings, excluding zeroes. 95% confidence interval in square brackets.
31
Table 3: Correlations - labor earnings residuals
(1)
w
(2)
wF T E
A- Gross correlations
Pearson
Spearman
.295
.341
[.241,.347]
[.289,.391]
.26
.296
[.205,.313]
[.242,.348]
B- Conditional on education
Pearson
Spearman
.178
.214
[.122,.234]
[.158,.269]
.163
.185
[.106,.219]
[.128,.24]
C- Conditional on social origin
Pearson
Spearman
.203
.246
[.147,.258]
[.191,.3]
.159
.204
[.102,.215]
[.148,.259]
D- Conditional on education and social origin
Pearson
Spearman
Obs.
.172
.206
[.115,.227]
[.15,.261]
.161
.194
[.104,.217]
[.138,.249]
1764
1764
FTE
Note: w : annual labor earnings, excluding zeroes; w
full-time equivalent annual labor earnings, excluding zeroes. 95% confidence interval in square brackets.
32
33
[.188,.245]
[.161,.219]
.257
[.212,.301]
.235
[.189,.28]
1677
.246
[.201,.29]
.213
[.205,.295]
[.184,.275]
[.166,.258]
.251
[.202,.292]
[.175,.266]
.23
.248
.221
1677
.292
3106
[.252,.316]
.284
[.293,.356]
.325
[.26,.324]
1185
[.272,.374]
.324
[.311,.41]
.361
[.275,.377]
.327
[.317,.415]
.367
FTE
1185
[.324,.422]
.374
[.358,.453]
.407
[.324,.422]
.374
[.368,.462]
.416
3106
[.305,.368]
.337
[.347,.407]
.377
[.309,.371]
.34
[.36,.419]
.39
(4)
mean w
1185
[.315,.413]
.365
[.335,.432]
.384
[.299,.399]
.35
[.335,.432]
.385
3106
[.319,.381]
.35
[.351,.411]
.382
[.311,.373]
.342
[.359,.418]
.389
(5)
wF T E
1185
[.394,.485]
.441
[.438,.525]
.483
[.388,.48]
.435
[.446,.532]
.49
3106
[.394,.451]
.423
[.425,.481]
.454
[.392,.45]
.421
[.438,.493]
.466
(6)
mean wF T E
Note:: w0 : annual labor earnings, including zeroes; w : annual labor earnings, excluding zeroes; w
full-time equivalent annual labor earnings, excluding zeroes.
’mean’ indicates the multi-year averages, computed over all years for which the information is available. 95% confidence interval in square brackets.
Obs.
Spearman (residuals)
Pearson (residuals)
.336
[.304,.367]
B-Couples observed at least 5 years
Pearson
4258
.217
.19
.215
[.186,.244]
.186
[.157,.215]
.218
[.189,.246]
.197
[.168,.226]
.222
[.193,.25]
.192
[.163,.221]
4258
Spearman
(3)
w
A- Couples observed at least 3 years
(2)
mean w0
Obs.
Spearman (residuals)
Pearson (residuals)
Spearman
Pearson
(1)
w0
Table 4: Correlations - multi-year average of labor earnings
Figure 1: Bivariate density - Annual earnings
A- earnings levels
3.8e-09
3.6e-09
3.4e-09
3.2e-09
3.0e-09
2.8e-09
2.6e-09
2.4e-09
2.2e-09
2.0e-09
1.8e-09
1.6e-09
1.4e-09
1.2e-09
9.9e-10
7.9e-10
5.9e-10
4.0e-10
2.0e-10
Density
5.1e-08
4.8e-08
4.5e-08
4.3e-08
4.0e-08
3.7e-08
3.5e-08
3.2e-08
2.9e-08
2.7e-08
2.4e-08
2.1e-08
1.9e-08
1.6e-08
1.4e-08
1.1e-08
8.2e-09
5.6e-09
2.9e-09
Density
0
10000
f w_
20000
30000
40000
Bivariate density plot
kernel=Gaussian
0
10000
20000
h w_
30000
40000
B- earnings ranks
0
rank of (w_f)
2000
4000
6000
Bivariate density plot
kernel=Gaussian
0
2000
4000
rank of (w_h)
34
6000
Figure 2: Bivariate density - Full-time equivalent earnings
A- earnings levels
20000
h w_fte_
30000
Density
10000
4.6e-09
4.4e-09
4.1e-09
3.9e-09
3.6e-09
3.4e-09
3.2e-09
2.9e-09
2.7e-09
2.4e-09
2.2e-09
1.9e-09
1.7e-09
1.5e-09
1.2e-09
9.7e-10
7.3e-10
4.9e-10
2.5e-10
5.3e-08
5.0e-08
4.7e-08
4.4e-08
4.2e-08
3.9e-08
3.6e-08
3.3e-08
3.1e-08
2.8e-08
2.5e-08
2.2e-08
2.0e-08
1.7e-08
1.4e-08
1.1e-08
8.7e-09
6.0e-09
3.2e-09
Density
10000
20000
f w_fte_
30000
40000
Bivariate density plot
kernel=Gaussian
40000
B- earnings ranks
0
rank of (w_fte_f)
2000
4000
6000
Bivariate density plot
kernel=Gaussian
0
2000
4000
rank of (w_fte_h)
35
6000
Table 5: Correlations and sample selection- labor earnings
(1)
ln wF T E
βOLS
ρ
σm
σf
N
βHeckman
ρ
σm
σf
ρres
N
(2)
ln(mean wF T E )
Panel A - Ignoring sample selection
.329
.326
.407
.411
5983
.359
.361
.396
.395
6383
Panel B - Accounting for sample selection
.321
.31
.421
.436
-.619
7526
.357
.353
.409
.414
-.606
7526
Note: β: regression coefficient; σ: standard deviation (for the male partner m and the female partner f );
ρ: correlation coefficient; ρres : correlation coefficient of the error terms of the selection and wage equations.
wF T E full-time equivalent annual labor earnings, excluding zeroes. ’mean’ indicates the multi-year averages,
computed over all years for which the information is available.
36
Table 6: Sample selection model - labor earnings
(1)
(2)
ln wfF T E
ln(mean wfF T E )
Female wage
Main equation
FTE
ln wm
cons
Selection equation
wm
2
wm
.321
.357
(.0127)
(.012)
.0802
.0606
(.0063)
(.0055)
7.0e-06
6.1e-06
(4.0e-06)
(4.2e-06)
-.0219
-.0234
(.0045)
(.0047)
agem
.0027
-.0037
(.0047)
(.005)
age2m
-6.0e-04
-6.1e-04
(3.1e-04)
(3.3e-04)
agef
age2f
.0209
.0215
(.0043)
(.0046)
-.0019
-.0022
(3.2e-04)
(3.3e-04)
years of educationf
years of education2f
number of children
.188
.167
(.0494)
(.052)
2.6e-04
.0013
(.0019)
(.002)
-.253
-.225
(.017)
(.0177)
long-term contractm
capital income
ρres
.157
.169
(.0499)
(.0527)
4.3e-06
4.8e-06
(2.3e-06)
(2.4e-06)
-.619
-.606
(.038)
(.038)
.414
.387
(.00467)
(.00409)
σε
FTE
Note: Standard errors in parenthesis. w
full-time equivalent annual labor earnings, excluding zeroes.
Indices m for the male partner and f for the female partner.
37
Table 7: Female labor market characteristics conditional on male earnings deciles
Male FTE earnings :
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
(1)
Work
(2)
Unemp.
(3)
Inactivity
(3)
Months worked
(4)
w
(5)
wF T E
0.65
0.74
0.75
0.76
0.78
0.81
0.81
0.8
0.8
0.72
0.093
0.07
0.069
0.074
0.051
0.046
0.048
0.049
0.058
0.069
0.26
0.19
0.18
0.16
0.17
0.14
0.14
0.15
0.14
0.21
9.3
9.6
9.6
9.8
9.6
9.6
9.8
9.7
9.6
9.3
14,740
15,405
15,694
16,062
16,813
17,968
19,178
19,975
21,296
24,868
20,118
20,008
20,142
20,425
21,718
23,442
24,460
25,674
27,782
33,458
Note: D1 (resp. D10) refers to the bottom (resp. top) decile of the male FTE distributions. w and wF T E
are expressed in 2011 Euros.
38
Table 8: Earnings inequality - Observed and simulated matching
(1)
Gini
(2)
Theil
(3)
A(1)
(4)
A(2)
(5)
p90/p10
A - Addition randomization
Annual earnings
Observed
Simulated
∆ inequality
0.270
0.247
-8.5%
0.121
0.099
-17.8%
0.124
0.103
-17.5%
0.268
0.220
-17.7%
3.722
3.332
-10.5%
FTE earnings
Observed
Simulated
∆ inequality
0.207
0.186
-10.1%
0.072
0.057
-21.3%
0.065
0.053
-18.7%
0.117
0.098
-16.2%
2.453
2.298
-6.3%
B - Imputation randomization
Annual earnings
Observed
Simulated
∆ inequality
0.270
0.263
-2.8%
0.121
0.114
-5.7%
0.124
0.116
-7.0%
0.268
0.245
-8.6%
3.722
3.515
-5.6%
FTE earnings
Observed
Simulated
∆ inequality
0.207
0.190
-8.3%
0.072
0.060
-17.4%
0.065
0.056
-13.6%
0.117
0.106
-9.3%
2.453
2.325
-5.2%
C - Addition randomization with sample selection correction
FTE earnings
Observed
Simulated
∆ inequality
0.196
0.179
-8.7%
0.062
0.051
-16.6%
0.060
0.050
-16.6%
0.116
0.097
-16.5%
2.474
2.283
-7.7%
Note: A(1) and A(2) denote the Atkinson inequality indices with coefficient 1 and 2 respectively; p90/p10
denotes the ratio of the ratio of the 90th percentile over the 10th percentile.
39
Appendix A
Main variables and descriptive statistics
The individual characteristics examined in this paper are the following.
Earnings Annual earnings are defined as the total wage and salaries earned in the previous
year deflated by the consumer price index. For individuals out of salaried employment, the value
of annual earnings is equal to zero21 . Earnings are self-reported from 2004 to 2007 and matched
with fiscal and administrative data afterwards. Preliminary analysis suggests that self-reported
earnings incorporate significant measurement error, with important consequences on the estimation
of earnings correlations. Without corrections, the correlation of partners’ earnings is about 25%
lower for self-reported earnings than for administrative earnings. To minimize the incidence of
measurement errors in earnings on our estimates of assortativeness, we winsorize the data by
recoding the bottom and the top 1% of the earnings distribution when earnings are positive.
When reported earnings are equal to zero, this value is kept unchanged. Bollinger and Chandra
(2005) show that winsorizing performs better than trimming in the presence of response errors.
After winsorizing, estimates based on self-reported earnings appear similar to those derived from
administrative data.
Full-time equivalent (FTE) earnings are defined as annual earnings/(number of months worked
full-time + 0.5 × number of months worked part-time) × 12. To compute FTE earnings, we rely
on the history of labor force participation reported in the survey. For each month in the preceding
year, individuals are asked to report their labor force status, which distinguishes between full-time
and part-time salaried work. Unfortunately, for individuals working part-time we do not observe
the share of working time. We thus assume that part-time work corresponds to 50% of full working
time.22 For individuals out of salaried work, FTE earnings are missing, by construction. We apply
the same winsorizing procedure to FTE earnings, as described above.
For both earnings measures, we compute multi-year averages of individual earnings. This
average is computed over the full set of available yearly observations. The number of years of
observation in our sample varies between 1 and 8 years, with an average of 3.4 years.
Educational attainment We use two measures of educational attainment. The first one is the
number of years of education, equal to the reported school leaving age minus 6 years (i.e. minimum
age for compulsory education)23 . Our second variable is based on the highest degree completed.
We consider a classification with 8 ordered levels : 1) no degree; 2) general lower secondary degree
; 3) vocational lower degree; 4) vocational upper secondary degree ; 5) general upper secondary
degree; 6) college (bachelor or technical degree); 7) master’s degree 8) PhD or elite schools degree
(Grandes Ecoles).
Occupation Our measure of occupation is based on the standard 6-levels French classification.
In order to come close to an ordinal measure of occupation, we gather farmers and unskilled manual
workers. This leads to the following classification: 1) Higher-grade professionals; 2) Lower-grade
professionals; 3) Artisans and small proprietors; 4) Non-manual employees; 5) Farmers and manual
workers. Respondents report their current or last occupation (in case of unemployment). The
information is missing for individuals out of the labor force.
21
Given our use of panel data, the individuals with zero earnings should have never reported any salaried
activity. Some of these individuals may however report unemployment period and so potentially unemployment benefits. Taking into account these benefits (as a proxy for earnings) does not change our estimates
but it increases the measurement errors. In the end, we decided not to include them.
22
Information on hours of work is only available at the time of the interview. In our sample, 65% of
part-time salaried individuals report working between 15 and 30 hours per week.
23
For some individuals, the number of years of education appears noisy. Furthermore, although highest
degree is reported for all individuals in the sample, number of years of education is missing for 9% of the
sample. For this reason we estimate the correlation in predicted number of years of education, where the
prediction is based on a regression of number of years of education on degree dummies interacted with
gender and a fourth degree polynomial function of birth cohorts.
40
Socioeconomic origin The SILC survey investigated individual socioeconomic origin and
gathered information on education and occupation of both parents of adult respondents. Information is only available for a sub-sample of our data, since the questionnaire only investigated
this topic in the 2005 wave. Our measure of parental occupation uses the same classification as
individual occupation (see above). Occupation is missing when the parent was continuously out of
the labor force during the respondent’s youth. Our measure of education is based on the highest
degree completed by the parents. The classification is the same as described above.
Table A.1: General descriptive statistics
Men
Women
42
40
Education
No degree
General lower secondary degree
Vocational lower degree
Vocational upper degree
General upper degree
College - bachelor degree
Master’s degree
PhD or elite schools degree
0.12
0.094
0.34
0.056
0.093
0.11
0.082
0.099
0.12
0.12
0.26
0.054
0.12
0.13
0.12
0.079
Labor market status
Employment
Unemployment
Inactivity
Number of months worked
0.94
0.047
0.015
11
0.77
0.062
0.17
7.7
0.96
25,079
14,815
26,206
14,135
27,400
14,333
0.8
14,581
11,199
18,141
9,563
23,648
11,625
7,864
7,864
Age
Earnings
Share of individuals with w>0
w0 (mean)
w0 (std error)
w (mean)
w (std error)
wF T E (mean)
wF T E (std error)
N
41
Appendix B
B.1
Simulation algorithms
Addition randomization
The addition randomization algorithm randomizes individual earnings within couples. Randomization is only allowed to occur given the age of both partners in the couple. Randomization relies on
a parametric model of labor force participation and a semi-parametric earnings regression model.
For all couples observed in the sample, the main steps of the earnings addition randomization are
the following :
1. Estimate a probit model of male labor market status (0 for no earnings in the previous year;
1 for strictly positive earnings) where the probability of positive earnings is a function of a
second order polynomial function of male age, female age and their interaction.
2. Estimate a linear regression model for joint earnings of the couple, on the sample of couples,
where log-earnings are regressed on the number of years of education of male and female
(second order polynomial), an interaction term in male and female education, a fourth order
polynomial of male and female age and a second order polynomial interaction of male and
female age. Store the distribution of predicted residuals.
3. Keep observations of female and male age and female labor earnings, including zeroes.
4. Randomize male labor market status by drawing from a Bernoulli distribution where the
probability of positive earnings is predicted on the basis of the probit model of step 1.
5. When labor market status is 1, randomize earnings using the earnings model of step 2 :
compute predicted log earnings conditional on age; randomly draw a value of the residual
on the basis of the empirical distribution of predicted residuals; take the exponential of the
sum of the previous two components.
B.2
Imputation randomization
The imputation randomization algorithm first randomizes education (number of years) among
couples, conditional on the age of both partners. Second, it randomizes the couple’s joint earnings,
by randomly drawing from the observed earnings distribution of couples with similar age and
education characteristics. Randomization is only allowed to occur given the age of both partners
in the couple. Greenwood, Guner, Kocharkov, and Santos (2014) and Eika, Mogstad, and Zafar
(2017) implement a non-parametric version of this randomization procedure. In our case, given
limited sample size, randomization relies on a semi-parametric regression model of education and
earnings. The steps of the imputation randomization are the following :
1. Estimate a linear regression model for years of education, on the sample of males, where
years of education (in log) is regressed on a function of a second order polynomial function
of male age, female age and their interaction.
2. Estimate a linear regression model for log earnings, on the sample of males with positive
earnings, where log-earnings are regressed on a fourth order polynomial of male age. Store
the distribution of predicted residuals.
3. Keep observations of female and male age and female years of education.
4. Randomize male number of years of education, conditional on the age of both partners, on
the basis of the regression of step 1. The average number of years is predicted based on
model’s estimated coefficients; the residual is randomized by drawing from the distribution
of predicted residuals.
5. Randomize couple’s joint earnings using the earnings model of step 2: compute predicted log
earnings conditional on age and education of both partners; randomly draw a value of the
residual on the basis of the empirical distribution of predicted residuals; take the exponential
of the sum of the previous two components.
42
B.3
Addition randomization with sample selection correction
Addition randomization with correction for sample selection is based on the model of section 4.
Instead of estimating the model of section 4 on observed individual earnings, the model is estimated
on earnings residuals computed from a preliminary regression in which earnings of both male and
female are regressed on a fourth order polynomial in age. Conditional on the age of both partners,
the algorithm randomizes the earnings residual based on the parametric joint log-normal model with
sample selection. The steps of the addition randomization algorithm with correction for sample
selection are the following :
1. Estimate a linear regression model for log FTE earnings of both male and female (separately),
on the sample of individuals with positive earnings, where log-earnings are regressed on a
fourth order polynomial of individual age. Store the distribution of predicted residuals and
predicted values.
2. Estimate a sample selection model of female earnings residual following the model of section
4 to recover the correlation in residual earnings and the variance of female earnings without
selection.
3. Keep observations of female and male age.
4. Compute predicted FTE earnings conditional on age for both male and female, using step 1.
5. Randomize male and female FTE earnings residuals by drawing residuals from a joint normal
distribution with parameters estimated in step 2. This first simulation allows to derive the
uncensored distribution of (latent) potential earnings in the population that corresponds to
the observed degree of assortative mating.
6. Randomize male and female FTE earnings residuals by drawing residuals from a joint normal
distribution with variances estimated in step 2 and covariance in residuals set equal to zero.
This second simulation allows to derive the uncensored distribution of (latent) potential
earnings in the population under the assumption of random mating.
43
Appendix C
Double selection
The selection model of section 4 assumes that endogenous sample selection only arises from the
female labor participation and employment processes. Consistent with this assumption, the model
is thus estimated on the subsample where male earnings is not missing. In our case, censoring
is four times more prevalent for women than for men : in our sample of 7,966 couples, earnings
are zero for 1,645 female partners against 440 male partners.24 Selection issues may however also
arise from the selection process that determines whether male earnings are observed and affect the
estimations of the intra-household earnings correlation. We address this issue in this appendix.
We estimate a double-selection process where sample selection is allowed to be non-random
due to both the observability of female earnings and that of male earnings. As in the model of
section 4, the main equation of the model is given by :
wf = β0 + βwm + ε
(2)
In the estimation of this equation, we account for the fact that both wf and wm may be zero.
Define Of (respectively Om ) a dummy variable indicating that wf (resp. wm ) is non-zero. We
assume that the process that determines the pair (Of , Om ) is given by :
(
Of = 1 (Zf γf + νf > 0)
Om = 1 (Zm γm + νm > 0)
(3)
where Zf and Zm are observable determinants of sample selection for, respectively, female and
male wages and (νf , νm ) is assumed to be a bivariate random normal vector.
Following Ham (1982), the model in equations 2 and 3 can be estimated by extending the
two-stage procedure of Heckman (1979) to the two selection rule problem. This amounts to include
two inverse Mills ratios in the estimation of equation 2, corresponding to the two selection processes
of equation 3. As in the original Heckman two-stage procedure, the predicted inverse Mills ratios,
λ̂f and λ̂m , are derived from first-stage estimates of the selection rule, which is in the present case
takes the form of a bivariate probit process.
The model is estimated on the full sample of couples that consists in 7,966 observations.
5,983 couples have non-zero earnings information for both partners. Variables included in the
selection rule for female earnings are quadratic functions in female age and years of education,
female self-assessed health, employment characteristics of the husband (indicators for non-zero
earnings, unemployment and permanent job contract) and household characteristics (number of
children, capital income, indicators for married couples, for living in rural areas, for the presence of a
disabled household member). Variables included in the selection rule for male earnings are quadratic
functions in male age and years of education, male self-assessed health, female characteristics
(age and years of education) and household characteristics (number of children, capital income,
indicators for married couples, for living in rural areas, for the presence of a disabled household
member).25
Estimation results are reported in table C.1. Most variables in the bivariate selection probit
model are highly significant. The correlation in the bivariate probit residuals is positive, around .3
and significant, indicating positive assortative mating in the unobserved determinants of reporting
non-zero earnings. The wage regression model, accounting for sample selection, indicates a negative
selection due to female earnings observability. However, the selection term for male earnings
observability is very close to zero and not statistically significant.
Altogether, these results indicate that censorship due to female zero-earnings is not random.
On the contrary, they support the assumption that censorship due to male zero-earnings can be
ignored, as assumed in the model of section 4.
24
102 couples have zero earnings for both partners and have been excluded from the estimations of
sections 3 to 5.
25
Variables that were not significant in the selection equations were omitted from the set of regressors.
44
Table C.1: Double selection model
(1)
(2)
Coef.
Std. Err.
0.2925
-0.4248
0.0463
0.0123
0.0253
0.0604
0.0157
-0.0028
0.2530
-0.0049
-0.3023
0.1097
-0.5622
-1.2761
-2.26e-06
-0.1896
0.1018
0.0025
0.0003
0.0514
0.0019
0.0169
0.0508
0.0979
0.2143
-1.90e-06
0.0434
0.0382
REF
0.0173
-0.1742
-0.4903
-0.9913
-0.2920
0.0414
0.0532
0.0845
0.2023
0.0656
-0.0170
-0.0010
0.1273
-0.0036
0.0166
-0.0006
0.1895
-0.0071
0.1782
0.0995
0.0068
0.0004
0.0565
0.0021
0.0062
0.0004
0.0745
0.0028
0.0596
0.0586
REF
-0.0788
-0.1891
-1.1294
-1.3293
-0.5824
0.0655
0.0817
0.1001
0.2027
0.0708
0.3100
0.1106
wage equation
dependent variable : wfF T E
whF T E
λ̂f
λ̂m
bivariate probit selection model
dependent variable : Of
agef
age2f
years of educationf
years of education2f
number of children
long-term contractm
unemployedm
Om
capital income
married
rural
healthf
very good
good
fair
bad
very bad
disabled
dependent variable : Om
agem
age2m
years of educationm
years of education2m
agef
age2f
years of educationf
years of education2f
married
rural
healthm
very good
good
fair
bad
very bad
disabled
ρ
Observations
total : 7966
Of =1 : 6321
Om =1 : 7526
Note: the f (resp. m) index denotes the female (resp. male) partner. ρ denotes the correlation of the error
terms of the two probit processes.
45