“The Parental Co-Immunization Hypothesis”
Miguel Portela
Paul Schweinzer
NIPE WP 18/ 2013
“The Parental Co-Immunization Hypothesis”
Miguel Portela
Paul Schweinzer
NIPE* WP 18/ 2013
URL:
http://www.eeg.uminho.pt/economia/nipe
The Parental Co-Immunization Hypothesis∗
Miguel Portela♮
Paul Schweinzer♯
12-Nov-2013
Abstract
We attempt to answer a simple empirical question: does having children make a parent live
longer? The hypothesis we offer is that a parent’s immune system is refreshed by a child’s
infections at a time when their own protection starts wearing thin. With the boosted
immune system, the parent has a better chance to fend off whatever infections might
strike when old and weak. Thus, parenthood is rewarded in individual terms. Using the
Office for National Statistics Longitudinal Study (ONS-LS) data set following one percent
of the population of England and Wales along four census waves 1971, 1981, 1991, and
2001, we are unable to reject this hypothesis. By contrast, we find in our key result that
women with children have a roughly 8% higher survival probability than women without
children. (JEL: I1, J1, R2. Keywords: Longevity, Infectious diseases, Family.)
1
Introduction
Children are born without significant defences against a large number of infections or diseases.1
They acquire this immunity through exposure and consequently are often and repeatedly ill in
their first years of life. Communal nursery, kindergarten and school attendance ensure that few
major infections are missed. This protects the children in later life. The involved viruses and
Thanks for comments and helpful discussions to Thomas Flatt, Ronen Segev, Pete Smith, and Steve Stearns.
The permission of the Office for National Statistics to use the Longitudinal Study is gratefully acknowledged
(clearance #30130), as is the help provided by staff of the Centre for Longitudinal Study Information & User
Support (CeLSIUS). CeLSIUS is supported by the ESRC Census of Population Programme (Award Ref: RES348-25-0004). The authors alone are responsible for the interpretation of the data. Census output is Crown
copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland. Financial support from the University of York Research and Impact Support Fund is gratefully acknowledged. Miguel Portela acknowledges the financial support provided by the European Regional Development
Fund (ERDF) through the Operational Program Factors of Competitiveness (COMPETE); and by national
funds received through FCT – Portuguese Foundation for Science and Technology [grant number PTDC/EGE–
ECO/122126/2010]. ♮ Department of Economics & NIPE, Universidade do Minho, Campus de Gualtar, 4710-057
Braga, Portugal,
[email protected]. ♯ Department of Economics, University of York, Heslington, York
YO10 5DD, United Kingdom,
[email protected].
1
Tollånes et al. (2008) show that babies born by Caesarean section have a 50% increased risk of developing
asthma compared to babies born naturally. Emergency Caesarean sections increase the risk even further.
This is probably because a Cesarean changes or postpones the bacterial colonization of a baby’s stomach
which is necessary for development of an immune system response. During the course of a vaginal birth,
babies obtain their mother’s vaginal, intestinal and perianal bacteria.
∗
bacteria adopt and mutate over time, though, and therefore this early-life immunization does
not last forever.
Our parental co-immunization hypothesis is that a fresh immunization at the adult stage is
obtained through having or being around children. Thus, given that parents survive the initial
exposure, they are better equipped for older age and live longer. More precisely, exposure to
the pathogens which cause a child to acquire its own immunization have a secondary effect
on the parents boosting the adults’ own immune systems. We test this hypothesis on females
only and subsequently verify that the male population does not exhibit significantly differing
characteristics. As a sanity check, we show that a similar effect is significant for childless
individuals in child caring and teaching professions which brings them in frequent and direct
contact with young children.
Evolutionary speaking, it should be beneficial for the gene-pool to duplicate more than
once with, perhaps, more than one partner, i.e., for any person to produce more than one child.
Using the ONS-LS data set, we cannot confirm an effect of the number of children on the
probability of dying from infectious diseases: having one or multiple children does not make
a statistically significant difference. The only effect is between living together with children
or not. But since a longer life span, in turn, also increases reproductive opportunities, reimmunization may benefit parents genetically and help illuminate the threshold to Medawar’s
‘Selection Shadow.’2 This effect may be pronounced through the prolonged life expectancy of
recent generations.
As discussed below, happiness, stress levels, etc may all have an impact on longevity. Using a
statistical approach, we cannot fully disentangle our hypothesis from alternative and competing
causal (behavioral) explanations: Specifically, we cannot rule out that the effects we find are
in part due to different life styles individuals may adopt when having children. As we can
control, however, for a parent’s marital state, we hope that some behavioral effects which may
affect a parent’s risk of acquiring infections are already addressed by this variable. Moreover,
the correlation we document for infectious diseases varies strongly across other diseases and
alternative causes of death. Finally, as we can confirm the positive effect on life expectancy in
individuals working with younger children (e.g., teachers) who do not have children themselves,
we are confident that the regularity we document is not entirely spurious.
Literature
It is well documented that happier people live longer. Among the theories competing to explain
this fact, prominent places are taken by individual wealth, marriage (e.g., Kaplan et al. (1994),
Ikeda et al. (2007), or Wood et al. (2009)), or sports (e.g., Lahdenperä et al. (2004)). The
purpose of our paper is to study the contribution of children.
More generally, our parental co-immunization hypothesis adds to (and contrasts in terms of
2
This evolutionary beneficial immunological explanation is in line with the ‘grandmother effect’ as proposed
by Lahdenperä et al. (2004). Kuningas et al. (2011) is a recent study into which genes control the presumed
trade-off between fertility and lifespan.
2
prediction with) the three classic theories of the evolution of aging (and its relation to fertility):
the Mutation Accumulation Theory due to Medawar (1946), the Antagonistic Pleiotropy Theory
due to Williams (1957), and the Disposable Soma Theory due to Kirkwood (1977).3 Each of
these theories suggests a particular mechanism trading off longevity against reproduction but
they all predict that, over a genotype’s life span, there is a genetic trade-off between early
reproduction and late fitness. Therefore, these theories usually associate an increased number
of children with decreased lifespan because of a postulated balancing between the resources
available for reproduction on the one hand and longevity on the other.4
A number of recent studies offer empirical evidence to substantiate this basic prediction of
the classic theories. A recent analysis of the association between the number of children and
the mortality of mothers is Dior et al. (2013). They concentrate on the effect the number of
children have on individual causes of death while we study whether or not having children at
all has an effect on the probability of dying of infections. In this setup, they observe higher
mortality rates for mothers than for women without children: qualitatively, the opposite of our
result. Similar positive associations between fertility and mortality are reported by Tabatabaie
et al. (2011) on a cohort of Ashkenazi Jews. Doblhammer (2000) reports increased mortality
risk for Austrian and Welsh early mothers while Helle et al. (2005) document no significant
effects in a population of Sami women.
There is, however, also a number of recent studies that report effects which are at least
partially compatible with our results.5 For instance, Wang et al. (2013) investigate the genetic
associations between post-reproductive lifespan and children ever born in the Framingham
Heart Study data set. In this sample, they find a U-shaped impact of number of children on
mortality. Having one or two children reduces maternal mortality while having more than a few
increases maternal mortality. McArdle et al. (2009) employ genealogical data from an Amish
community in Pennsylvania to document that high parity among men and later menopause
among women may be markers for increased life span. Müller et al. (2002) study the relation
between fertility and post-menopausal longevity in a historical French-Canadian sample of
1635 women and find that increased fertility is linked to increased rather than decreased postreproductive survival. Specifically, they relate mortality to the age of the youngest child.
There is a group of recent studies which are a bit further removed from the central question
behind our work but still impinge on some of its aspects. Helle et al. (2002), Hayward et al.
(2013), and Helle and Lummaa (2013) are key results on the influence of early life circumstances
on mortality. Berg et al. (2012) analyze both the impact of the economic conditions at births
and the years leading to puberty on the individual fertility rate and, subsequently, examine
the protective effect of fertility on mortality. They find that while women’s health suffers from
3
4
5
For a beautiful introduction to the theory of aging see Fabian and Flatt (2011). A more detailed overview
is http://en.wikipedia.org/wiki/Senescence#Evolutionary_theories. We should also point out that
Evolution of Lifespan views such as Stearns (1992) are usually seen to (partially) counterbalance the lifespan
shortening effects of the classical theories and are thus more in line with our results.
See, for instance, Partridge and Barton (1993), Kirkwood and Austad (2000), or Flatt and Promislow (2007).
Hurt et al. (2006) analyze the role that statistical and methodological errors can play in explaining some of
this apparent empirical inconsistency.
3
fertility during their reproductive period, fertility has a large, protective causal effect on female
mortality in post-reproductive years. Gagnon et al. (2008) test the trade-off between fertility
and longevity in frontier populations. They test the hypothesis whether increased reproduction
reduces the chances for survival in old age and find a negative influence of parity and a positive
influence of age at last child on postreproductive survival.6
This paper demonstrates the existence of an empirical regularity in the sample followed by
the England and Wales census over four decades. We offer a hypothesis which explains these
observations but we can only superficially comment on the medical plausibility of the proposed
transmission mechanism itself. What we are certain of is that our view is not uncontroversial.
For instance, Graham et al. (2007, p713) state that “the incidence of viral respiratory illness
peaks in infancy and early childhood and steadily decreases with age because of changes in
patterns of exposure and age-related acquisition of specific immunity to an increasing number
of virus types encountered over time.”7
2
Survival analysis
2.1
Single risk models
Survival analysis is the study of duration data. In the following discussion we assume that time
is running continuously, and we therefore describe duration by a continuous random variable,
denoted by T . In our setup, the duration data has information on the time from a well–defined
starting point until the event of interest occurs, or until the end of the data collection process.
Death is most adequately modeled as the probability of dying given that the person survived
until that time, so that time until failure (duration or survival) models are most appropriate.
Furthermore, we ignore the role of possible unobserved heterogeneity.
There are three functions critical to the analysis of time: (i) the density function, f (t);
(ii) the cumulative density function F (t); and (iii) the survival function, S(t) = 1 − F (t).8
Knowing one of these functions means, at least in principle, than we can derive the other two.
Each analyzed subject is characterized by (i) survival time, or spell, (ii) status at the end
of the survival time (event occurrence or censored), and, in some cases, (iii) the study group
(s)he is in. In our case the groups are alive, death by infection and death for other reason
(later, other reasons are split in different causes of death).9
The hazard function, λ, is a key concept in survival analysis and is defined as the rate of
6
7
8
9
A recent a landmark paper on general ageing research is López-Otı́n et al. (2013).
A recent survey of the epidemiology of viral respiratory infections is, for instance, Monto (2002).
One should not see the function f (t) as a probability since it can take values bigger than 1. We can see it
as a function that describes how probability is distributed over the domain of T .
Censoring means that the total survival time for that subject cannot be accurately determined. This could
happen because the subject drops out, is lost to follow-up, or because the study ends before the subject
experiences the event of interest. In this case the individual survived at least until the end of the study,
which means that there is no knowledge of what happened thereafter. As such we face right censoring as
the individual is removed from the study before the event occurs.
4
failure at a point in time t, given survival until that time
Pr (t ≤ T < t + dt|T ≥ t, x)
,
dt→0
dt
λ (t, x) = lim
(1)
where T denotes the random variable length of stay, measured in continuous time, and x is
a vector of explanatory variables consisting of individual characteristics. Among the possible
alternative interactions between T and x proposed, the most popular in the length of stay
literature is the proportional hazards (PH) specification.
The most commonly used semiparametric duration model is the Cox PH model. Cox suggest
a likelihood procedure (partial likelihood) to estimate the relationship between the hazard rate
and explanatory variables in the following general proportional hazards model
Cox: λ (t, x, β) = λ0 (t) exp (x′ β) .
(2)
In Cox’s model we do not need to make assumptions about the functional form of the
baseline hazard function, λ0 (t). As a result, and as Lee and Wang (2003) put it, “the ratio of
the risk of dying of two individuals is the same no matter how long they have survived.” The
model, defined as
log λi (t) = β0 (t) + β1 x1i + ... + βr xri ,
(3)
leaves the baseline hazard function β0 (t) = log λ0 (t) unspecified. This way, the model is
semiparametric. This results from the fact that the baseline hazard can take any form, and
that the covariates enter the model linearly. The baseline hazard does not depend on covariates,
but only on time, and the covariates are time-constant. As a result we have the proportional
hazard assumption.
The fact that the Cox model does not estimate the baseline hazard, λ0 (t), is both an
advantage and a disadvantage. For two observations i and j, the hazard ratio
λ0 (t) exp (β1 x1i + ... + βr xri )
λi (t)
=
λj (t)
λ0 (t) exp (β1 x1j + ... + βr xrj )
" r
#
X
= exp
βl (xli − xlj )
(4)
l=1
is independent of time t. This implies that the Cox model is a proportional hazards model.
However, this property comes at a cost. The efficiency of the estimates is reduced as this
approach discards information regarding actual failure times and uses only their rank order.
Alternatively, the hazard function can be restricted to a multiplicative form and defined as
λ (t, x, β, θ) = λ0 (t, θ) φ (x, β) ,
(5)
where λ0 is the baseline hazard function and depends both on time, and, compared to Cox’s
model, on an additional parameter, θ. This parameter is a vector of auxiliary parameters
5
characterizing the distribution of T . β is a vector of unknown coefficients associated with x
and φ (x, β) is a proportionality factor which does not depend on duration. φ (x, β) is a nonnegative function of the covariates. With proportional hazards the effects of the regressors on
the conditional probability of failure do not depend on duration. The baseline hazard function
summarizes the pattern of duration dependence, and alternative specifications of the baseline
function lead to different hazard functions.
The impact of the covariates on the hazard function can also be estimated using parametric
techniques, which require potentially restrictive assumptions regarding the functional form
of the baseline hazard function. Parametric models are useful because of their predictive and
extrapolative capabilities, as well as the possibility for quantification of the effects of covariates.
Within parametric models, the Exponential and Weibull models are common solutions. These
models are defined as
Exponential: λ (t, x, β) = λ0 exp (x′ β) ,
(6)
and
Weibull: λ (t, x, β, α) = αtα−1 exp (x′ β) ,
(7)
respectively. The exponential model assumes a constant baseline hazard for each patient, while
the baseline hazard for the Weibull model is strictly increasing or decreasing depending on
the value of α.10 “The exponential distribution is the only one that has the lack of memory
property that the distribution of the residual lifetime, after truncation, is the same as the
original distribution” (Hougaard, 2000).
In a regression analysis environment, a parametric model based on the exponential distribution may be written as
log λi (t) = β0 + β1 x1i + ... + βr xri .
(8)
The constant β0 can be interpreted as some form of log–baseline hazard. For the Weibull model
this regression becomes
log λi (t) = ln(α) + (α − 1)ln(t) + β1 x1i + ... + βr xri
= β0 + (α − 1)ln(t) + β1 x1i + ... + βr xri .
2.2
(9)
Competing risks models
Within our setup, we can clearly distinguish, for example, three possible causes of death:
infections, cancer, and heart disease (the censored observation is alive). It follows, naturally,
that as more individuals die from infections, there are fewer individuals at risk to die from
cancer. It is the case that individuals face multiple causes of death, and as such the number of
deaths for a particular cause will influence the estimate of the probability of dying due to the
cause under scrutiny.
10
The parameter α assumes only positive values. If α > 1 then the hazard function increases monotonically;
if α < 1 then it decreases monotonically; and if α = 1 the model collapses to the exponential case.
6
In this setting one needs to deal, simultaneously, with different competing events. The
models discussed above must be adapted in order to deal with the fact that the number of
failures from any competing risk (of failure) will condition the number of failures from the
main failure, which, in turn, implies changes in the estimate of the probability of failure.
Failures from any competing risk reduce the number of individuals at risk of failure from the
cause under analysis (Gooley et al. (1999)). Competing risks are events that occur instead of
the failure event of interest, which implies that we cannot treat them as censored. It follows
that a competing risks framework becomes a natural solution for our estimation strategy. Two
advantages follow from the use of a competing risks model. The theory behind the model allows
both the computation of hazard functions where individuals can die due to multiple causes,
and the computation of probabilities of death according to different values of the covariates.
Under the existence of competing risks we want to focus our attention on cause–specific
hazards, as compared to standard hazards. A cause–specific hazard is the instantaneous risk of
failure from a specific cause given that failure (from any cause) did not yet happened. We can
see our problem as one where we have two cause–specific hazards: one for death by an infection
and one for death by other cause. For the sake of simplicity, we focus below on this particular
setting of the problem at hand. The analysis can easily be extended to a situation where one
has three or more causes of death.
When we have competing events, we need to focus on the cumulative incidence function
(CIF) rather than the survival function. A CIF is just the probability that a specific type of
event is observed before a given time. In our analysis, we have two CIFs; one for death by
infection and one for death due to other causes. For example, the CIF for death by infection
at 60 years is just the probability of death by infection before an age of 60. CIFs begin at time
zero and increase to an upper limit equal to the eventual probability that the event will take
place, but this is not equal to one because of competing events. Mathematically, the CIF for
death by infection is a function of all cause–specific hazards.
So, in a competing risks setting, a Kaplan-Meier curve is inadequate for three reasons.11
First, it fails to acknowledge that death by infection may never occur. Second, the Kaplan–
Meier solution does not take into account dependence between competing events. Third, facing
competing risks, it is better to reverse the temporal ordering of the question. This implies that
using Kaplan–Meier demands too much of the data; it requires (i) independent risks and (ii) a
setup where the competing event does not occur. Berry et al. (2010) summarize the argument:
“Kaplan-Meier survival analysis and Cox proportional hazards regression [...] can overestimate
risk of disease by failing to account for the competing risk of death.”
The CIF gives the proportion of individuals at time t who have died from cause k accounting
for the fact that patients can die from other causes. For example, the CIF for death due to
infections depends not only on the hazard for death by infection but also on the remaining
hazards associated with other causes of death. This implies that it is no longer possible to define
11
The Kaplan-Meier estimator is often used for estimating the survival function from lifetime data in medical
research. For discussions see, for instance, Clark et al. (2003).
7
a direct relation between cause–specific hazard rate and the probability of death. Although
nonparametric estimation of CIFs is flexible, it cannot be adjusted for relevant regressors, as
they are associated with the cause–specific hazard. The efficient (and correct) way to run CIF
covariate analysis is to implement a competing risks regression, according to the model of Fine
and Gray (1999).
Fine and Gray (1999) propose an alternative to cause–specific hazards: a model for the
hazard of the subdistribution for the failure event of interest, known as the sub–hazard. Unlike
cause-specific hazards, discussed above, there is a one-to-one correspondence between sub–
hazards and CIFs for respective event types; that is, the CIF for local relapse is a function of
only the subhazard for local relapse. Covariates affect the sub–hazard proportionally, similar to
the Cox regression. The authors propose a transformation of the Cox model associated with a
direct transformation of the CIF. From the relation between the hazard and survival functions,
Fine and Gray (1999) define a subdistribution function.
Although it is not at the core of our analysis, it is important to stress that the difference
between cause-specific and subdistribution hazards is the risk set. For the cause-specific hazards, the risk set decreases each time there is a death from another cause (censoring), while
under subdistribution hazards those who die due to another cause remain in the risk set and
are given a censoring time, larger than all event times. Cause–specific hazard ratios give us a
relative measure, where we can use standard survival analysis methods. However, covariates
may not be associated with the cause–hazard. With the subdistribution hazards we account for
competing events by altering the risk set. As there is a direct link between the subdistribution
hazard and the CIF, one can compute the regressors’ effects.12
Setup of competing risks models
In a general setting, for each individual in a competing risks model, the type of failure is
specified by J, with values ranging from 1 to k. The random duration variable is defined by T .
We assume, within our analysis, that there exists only one period of duration. The spell ends
when individuals leave for one of the k possible states (states = failure type). The states are
mutually exclusive and exhaustive, and are identified by the index j, where j = 1, ..., k.13
There are k random variables, T (1) ,T (2) , . . . , T (k) , corresponding to the existing states. These
variables can be interpreted as latent durations. These are abstract time periods used in the
construction of the models, where T (j) is the time to failure to state j after the elimination of
all other possible states. For each point in time, entry into a certain state is dictated by the
smallest latent time period (the smallest T (j)). The time to failure can be specified as
T = min[T (1) , . . . , T (k) ],
(10)
where and J = j, if T = T (j). For each individual, only one T (j) is observed and others are
12
13
See Fine and Gray (1999) for a further discussion of semiparametric proportional hazards model for the
subdistribution.
In the presentation of the setup of competing risks models we follow closely the discussion in Sá et al. (2007).
8
considered censored. We will have a competing risks model with independent risks under the
assumption that the random variables T (1) , . . . , T (k) are independent.
It is possible to estimate conditional and unconditional probability functions that characterize the variables T and J. The expression
P (t ≤ T < t + dt, J = j|T ≥ t, x)
dt→0
dt
(11)
λj (t, x) = lim
is the transition intensity into state j. These functions are designated as cause–specific hazard
functions; they can be empirically interpreted as the fraction of survivors at time t that subsequently leave for state j. If one assumes a proportional hazards specification, the cause–specific
hazard functions can be defined as
λj (t, x, βj , αj ) = λ0j (t, αj ) exp (x′ βj ) , j = 1, ..., k,
(12)
where the risk-specific baseline hazard function is λ0j (t, αj ). The parameters βj and αj are
allowed to freely vary across the k failure types. Alternative distributions for the cause-specific
baseline hazard lead to different cause-specific hazard functions. For example, if a Weibull
baseline hazard is assumed, then the hazard function becomes
λj (t, x, βj , αj ) = αj tαj −1 exp (x′ βj ) .
(13)
It follows that we can estimate a set of coefficients for each of the competing risks.
Finally, the log-likelihood function within a competing risks framework can be expressed as
ln L =
" n
k
X
X
j=1
di ln f (ti , βj , xi , θj ) +
i=1
n
X
i=1
#
(1 − di ) ln S (ti , βj , xi , θj ) .
(14)
For a more detailed discussion of competing risks models, see, for instance, Cox (1959),
David and Moeschberger (1978), Prentice et al. (1978), Lancaster (1990), and Kalbfleisch and
Prentice (2002).
3
Data and model specification
3.1
Data selection and description
We select a sample of 155,062 women that were born before and are alive in 1971, and whose
age throughout the sample is bounded between 16 and 85 years. The variable Young equals 1
if the woman had an own child.14
From Table 1 we observe that average sample age is about 57 years. Detailed descriptives
on age show us that 10% of the individuals have age equal or above 79 years; 10% of the sample
14
For the robustness checks discussed in Section 4.3.3 we use a sample of 182,895 men. In this case Young
equals 1 if they lived with a child in a common household at some point through their life span.
9
Table 2: Health status
Status
Frequency Share
Alive
108,717 70.11
Heart Diseases
13,563 8.75
Infections
5,318 3.43
Cancers
13,410 8.65
Other Diseases
5,321 3.43
Accidents, Hom., Suic.
1,111 0.72
Errors, Open, Others
7,622 4.92
Status indicating alive, or cause of death.
Source: ONS-LS.
Table 1: Descriptive statistics
Variable
Mean St.Dev. Min.
Max.
Age
57·25
15·87 16
85
Died
0·30
0
1
Young
0·80
0
1
WChildren
0·04
0
1
Married
0·83
0
1
Occupation
0·61
0
1
House
0·80
0
1
The total number of observations is 155,062.
Source: ONS-LS.
is at most 35 years old. Age is either most recent age for alive individuals, or age at death.
In our sample, 30% of the individuals have died (Died). Details on the causes of death
are provided in Table 2. The most common causes of death are Heart Diseases and Cancers.
Looking to the event of interest for the current paper, 3.4% of the individuals in the sample
died due to infections, which accounts for 11.5% of the deaths.15 about 80% of our sample had
young children, Young. The share of individuals that have at some point in their lives worked
with young children, WChildren, is 4%. The share of individuals who were married, Married,
were in white collar professions, Occupation, or own a house, House, are 83%, 61%, and 80%,
respectively.
3.2
Empirical model
Death age is our duration variable, T . Cause of death, J, is equal to 0 if the observation is
censored (the individual is alive), 1 if the individual dies from an infection and, 2 if (s)he dies
from other causes.16 Individual characteristics include having young children in 1971 (Young),
worked with children (WChildren=1 if ever worked with children), married (Married=1 if the
individual was ever married), occupation (Occupation=1 if it is at some point in time a white
collar worker), and house ownership (House=1 if the individual owns a house). The last two
variables proxy for education and income, while the first two are our variables of interest.
The regressors used are the same in all specifications. Thus,
x′ β = β1 Young + β2 WChildren + β3 Married + β4 Occupation + β5 House
(15)
where, depending on the specification, the constant may be explicitly added to the model or
15
16
The ONS-LS data set uses International Classification of Diseases (ICD) codes to categorize the main and,
if applicable, contributory reasons of death. These codes come in several revisions of which 8,9, and 10
are relevant for the census waves we study; for details see World Health Organization (2010). The exact
definition of infectious disease we use is the following combination of ICD-9 codes (and their earlier and
later equivalents): Intestinal Infectious Diseases 001–139, Chronic Obstructive Pulmonary Disease 490–496,
Occupational or Environmental Lung Disease 500–508, Other Diseases of Respiratory System 510–519. The
other reasons for death listed in our tables (i.e., Heart Disease, Cancer, Other, Accident & Homicide &
Suicide, and Error) are defined similarly according to the ICD system.
Later we will expand the set of alternative causes of death according to Table 2.
10
defined implicitly by a set of dummy variables. Coefficient estimates are then interpreted as
the impact of each variable on the (conditional) probability of death and, consequently, on the
age at death. For example, a negative estimate for β1 indicates that, everything else constant,
individuals with children show lower death probabilities and hence are more likely to stay alive.
4
Empirical analysis
4.1
Approach from a nonparametric perspective
In order to define a relatively homogenous group of individuals, we only consider the sample of
women for most of our analysis. Later we run robustness checks with a sample of males.
0.50
Nelson−Aalen cumulative hazard estimates
0.00
0.00
0.10
0.25
0.20
0.50
0.30
0.75
0.40
1.00
Kaplan−Meier survival estimates
60
65
70
75
analysis time
young = 0
80
85
60
young = 1
65
70
75
analysis time
young = 0
80
85
young = 1
Figure 1: Left: Nonparametric: Kaplan-Meier survivor function. Right: Nonparametric: cumulative
hazard. Sample where we define deaths by other causes as censored. Source: ONS-LS.
We start by considering a nonparametric estimation. At this stage, we consider three states
for an individual’s life condition: (i) alive; (ii) death by infection; (iii) death by other causes.
Assume, for now, that the status is death by infections, and redefine other causes of deaths as
alive (Left, Figure 1). The differences in the survivor function under Kaplan-Meier are small,
with the survivor function for women with children slightly higher. In the right panel of Figure
1 we observe the corresponding cumulative hazard. (See the left and right panels of Figure 2 for
smoothed versions of the cumulative hazard, based on Epanechnikov and Gaussian smoothing
functions, respectively). Implementing a log-rank test for equality of survivor functions, we
obtain a χ2 statistic with 1 degree of freedom of 2.42, with a corresponding p-value of 0.12;
i.e,. marginally, we do not reject the null hypothesis that both functions across the Young status
are equal. Combining this information, although the evidence is not conclusive, it points, to
some extent, to longer survival for women with children, or those who lived with children.
We next drop observations corresponding to death by other reasons. Now, the KaplanMeier and the Nelson-Aalen cumulative hazard are represented in Figure 3 which (as Figures
1), separate females according to the Young status. See Figure 4 for the smoothed versions.
Once we opt for dropping observations corresponding to occurrences of death by other reasons,
we do observe a stronger separation of survival and cumulative hazard according to the children
11
Smoothed hazard estimates
0
0
.05
.005
.1
.01
.15
.015
.2
.02
Smoothed hazard estimates
60
65
70
75
analysis time
young = 0
80
85
60
65
young = 1
70
75
analysis time
young = 0
80
85
young = 1
Figure 2: Left: Nonparametric: cumulative hazard, Epanechnikov smooth. Right: Nonparametric:
cumulative hazard, Gaussian smooth. Sample where we define deaths by other causes as censored.
Source: ONS-LS.
status. The log-rank test for equality of survivor functions shows a χ2 statistic of 173.41, with
a corresponding p-value of approximately 0; i.e., we reject the null hypothesis that both
functions across the Young status are equal. These results align with our hypothesis that life
length can vary between those who had children, and those who had not. Women who had
children (or lived with children) show both a higher survival rate, as well as a lower hazard rate
at each age, indicating that, on average, they live longer.
0.50
1.00
1.50
2.00
Nelson−Aalen cumulative hazard estimates
0.00
0.00
0.25
0.50
0.75
1.00
Kaplan−Meier survival estimates
60
65
70
75
analysis time
young = 0
80
85
60
young = 1
65
70
75
analysis time
young = 0
80
85
young = 1
Figure 3: Left: Nonparametric: Kaplan-Meier survivor function. Right: Nonparametric: cumulative
hazard. Sample where we eliminate deaths by other causes besides infection. Source: ONS-LS.
We run a counterfactual analysis by considering that the event of interest is death by other
causes. First, replicating the strategy designed above, we set those who died by infections as
alive (although an incorrect procedure, this might give us a hint for what to expect when we
move to the correct procedure). The Kaplan-Meier survivor function is represented in the left
panel of Figure 5. If, as before, we drop the alternative death cause, which in the current case is
death by infections, we obtain the right panel of Figure 5. Both figures seem to corroborate the
key message of our paper; i.e., that the children status seems to matter for death by infections
which it does not, or at least not to the same degree, for deaths by other reasons. This implies
12
Smoothed hazard estimates
0
0
.2
.02
.4
.04
.6
.06
.8
.08
Smoothed hazard estimates
60
65
70
75
analysis time
young = 0
80
85
60
65
young = 1
70
75
analysis time
young = 0
80
85
young = 1
Figure 4: Left: Nonparametric: cumulative hazard, Epanechnikov smooth. Right: Nonparametric:
cumulative hazard, Gaussian smooth. Sample where we eliminate deaths by other causes besides
infection. Source: ONS-LS.
0.25
0.50
0.75
1.00
Kaplan−Meier survival estimates
0.00
0.00
0.25
0.50
0.75
1.00
Kaplan−Meier survival estimates
60
65
70
75
analysis time
young = 0
80
85
60
young = 1
65
70
75
analysis time
young = 0
80
85
young = 1
Figure 5: Left: Nonparametric: Kaplan-Meier survivor function. Sample where we define death by
an infection as censored. Right: Nonparametric: Kaplan-Meier survivor function. Sample where we
eliminate deaths by infection. Source: ONS-LS.
what we observe in both figures: survival by children status is indistinguishable. Performing
the log-rank test for equality of survivor functions, we do, however, reject the null that both
survivor functions are equal.17 We discuss later why for some other causes of death, besides
infections, we might still find a statistical difference across Young status. Essentially, we argue
that this might be due to behavioral differences.
4.2
Semiparametric results
We now move to a semiparametric analysis and present results for the Cox proportional hazard
model. The failure occurs when infections are the cause of death, and we treat, in a first stage,
death by other causes as alive. The left panel of Figure 6 shows our first results. We observe
that those who do not have children nor work with children face a higher hazard rate when
compared to those with children or working with kids. If one drops observations related with
17
The χ2 statistics are 11.23 (p-value = 0.0008) and 36.20 (the p-value is approximately 0), respectively.
13
Cox proportional hazards regression
0
0
Smoothed hazard function
.005
.01
Smoothed hazard function
.01
.02
.03
.04
.015
Cox proportional hazards regression
30
40
50
60
analysis time
young=0 working_yngkids=0
young=1 working_yngkids=0
70
80
30
young=0 working_yngkids=1
young=1 working_yngkids=1
40
50
60
analysis time
young=0 working_yngkids=0
young=1 working_yngkids=0
70
80
young=0 working_yngkids=1
young=1 working_yngkids=1
Figure 6: Left: Cox proportional hazard regression. Other Causes of death are treated as censored.
Right: Cox proportional hazard regression. Observations for other causes of death are dropped.
Source: ONS-LS.
death by other causes instead of treating these individuals as alive, we get the result in the right
panel of Figure 6. The results point in the same direction as in the nonparametric analysis.
I.e., women with children, or working with children live longer.
In Table 4 (in the appendix) we present the estimation results for the Cox, Exponential and
Weibull models. Models (1) and (2) define the failure event as death by an infection, while
models (3) and (4) consider the failure as death by other cause besides infections. Models
(1) and (3) treat the other cause of death as censored (alive), while models (2) and (4) drop
observations associated with other causes of death. Under models (3) and (4) other cause of
death only includes death by an infection.
All models are statistically significant as shown by the likelihood ratio tests. Being at some
point in time a white collar worker or owning a house is associated with higher life expectancy.
Married status is either associated with higher life expectancy, or statistically insignificant for
the determination of the hazard rate.
The key variables of interest for the analysis are Young and WYoung. Models (1) and (2),
using failure as death by infection, are the main semiparametric estimations, while models (3)
and (4) work as counterfactual analysis. Under Model (1), the mistake we make is that death
by other causes is treated as a censored observation. Under this restriction, and estimating a
Cox model, we observe that the hazard is 22% lower than the baseline hazard for those who
work with children. Although the hazard rate is slightly lower for women with children, this
difference is not statistically different from zero (the hazard ratio is not statistically different
from 1). Excluding the observations corresponding to death by other causes from the analysis
(Model (2), column Cox), we observe that the hazard for women is about 21% lower than the
base line hazard, while for those who worked with children it is about 23% lower. In both cases
the hazard ratios are statistically different from 1, and very close to each other. Looking to
the counterfactual analysis, Models (3) and (4), and still focusing at the Cox estimations, we
observe that the hazard ratios for the regressors of interest increased substantially. I.e., the
impact of having a child, or working with children, is substantially lower when compared to the
14
results under Models (1) and (2). The hazard ratio for women with children is under 4%; and
for those working with children is 8%. Keep in mind that these results are incorrect following
the discussion above. Still, the two sets of results do not reject our hypothesis.
4.3
4.3.1
Parametric results
Base estimations
.01
Cumulative Hazard
.02
.03
.04
.05
Exponential regression
20
40
60
analysis time
young=0 working_yngkids =0
young=0 working_yngkids =1
80
100
young=1 working_yngkids = 0
young=1 working_yngkids = 1
Figure 7: Parametric estimation. Source: ONS-LS.
Implementing a parametric estimation with a single risk, Figure 7, we are now clearly able
to observe a distinction between individuals without kids (or contact with kids), and those who
either had children and/or worked with children. Being in the presence of children significantly
decreases the hazard of death.
Estimation results are provided in Table 4 (in the appendix), columns Exponential and
Weibull. Looking to the log likelihood, it is always the case that the Weibull model presents
the lower absolute value. Given the topic of our analysis, probability of dying, a priory the
Weibull model seems more appropriate; i.e., the failure rate is expected to increase with time
if there is an “aging” process. In all models the hypothesis that ln(p) equals zero is rejected.
As such, we focus this section of the discussion the Weibull parametric estimation. Across all
models the Weibull estimations confirm what we already observed with the Cox semiparametric
estimations. I.e., women who have children or work with children live longer. Additionally,
the counterfactual analysis strengthens our hypothesis as the effect hypothesized is minimal or
non-existent for other diseases.
4.3.2
Competing risks
Figure 8 introduces the results for the competing risks estimations. In all figures we have
represented the cumulative incidence of four situations: (i) women without children who never
15
worked with children, (ii) women without children who worked with children; (iii) women with
children but who never worked with children; and, finally, (iv) women with children who worked
with children.
Figure 8 Left is obtained after an estimation of a competing risks model where failure is
defined by death due to an infection, while the competing event is death by other causes, in
which case all other causes are aggregated in a single category. The right figure is the result
of an estimation where we reverse the roles of the previous two sets of causes of death: failure
is death by other causes, while the competing event is death by an infection. There are clear
differences between the two figures. While in the right hand figure, the counterfactual situation,
there seems to be no distinction between the four cumulative incidences, on the left it seems
clear that there is a separation of the different cumulative incidences. Women with children and
those working with children live longer; women without children and those who never worked
with children die at younger ages.
Table 5 (in the appendix) shows the estimation results for the different competing risks
models that we discuss below. Definition A stands for the situation where causes of death
are aggregated in two groups: infections and other diseases. Definition B occurs when we
disaggregate other causes of death according to the categories defined in Table 2. All models
estimated are statistically significant. Reinforcing the results discussed above, being married,
being white collar or owning a house are all factors associated with longer lives. Focusing on
the variables of interest, and looking to the key model in which failure is defined by death
due to infection, Model (1), we conclude that women who either worked with children or had
children live longer. Having children decreases the hazard rate by almost 8%, while working with
children is associated with a reduction of the hazard by about 20%. Combining both conditions
is associated with a reduction in the hazard of about 29%. The counterfactual, column (2),
indicates that the effects of the key variables are much smaller. We face a combined effect
below 8% against 21%; i.e., 21 percentage points smaller. At the same time, a reduction in
the statistical significance occurs. This parametric interpretation is, naturally, aligned with the
observations of Figure 8.
In a second stage we show the results for Definition B in Table 5, where we further disaggregate the causes of death. Under columns (3) to (8) we define the failure event as death by other
cause besides infection, namely Heart Diseases, Cancers, Other Diseases, Accidents, Homicides,
Suicides, and Errors, Open or Other causes, respectively. In each case, the competing events
are either infections and all other causes, columns (a), or just other causes as we drop the
observations associated to death by an infection, columns (b). Columns (b) can be viewed as
robustness checks for counterfactuals that are in themselves robustness checks.
Under Definition B, working with children is not coupled with the length of life. Except
for Other Diseases, Column (5), condition (b) where we drop the observations associated with
death by other causes, we face a marginally significant result at the 10% level of significance.
Combining this result with the one under Column (1), our key result, we conclude in favor of
our hypothesis: contact with children matters for death by an infection, but not for death by
16
Other Diseases
0
0
Cumulative Incidence
.05
Cumulative Incidence
.2
.4
.6
.1
.8
Infectious Diseases
50
60
70
analysis time
young=0 working_yngkids = 0
young=0 working_yngkids = 1
80
90
50
young=1 working_yngkids = 0
young=1 working_yngkids = 1
60
70
analysis time
young=0 working_yngkids = 0
young=0 working_yngkids = 1
80
90
young=1 working_yngkids = 0
young=1 working_yngkids = 1
Figure 8: Left: Competing risks. Main risk: Infectious Diseases. Right: Competing risks. Main risk:
Other Diseases. Source: ONS-LS.
other causes, as predicted by our hypothesis.
Finally, we focus on the effect of having children on the hazard of dying by other causes.
From Table 5 we can observe that this covariate does not matter for death by Cancer, column
(4), neither for that by Errors, Open, or Other causes. This is the expected result. According
to our hypothesis, having children should be uncorrelated with the timing of death by other
cause besides an infection. The critical results, in the sense that they seem to be in dissonance
with our hypothesis, are those under columns (3), (5) and (6). Death by Heart Diseases, Other
Diseases, and Accidents, Homicides and Suicides, respectively. We argue that the results under
these columns do not reject our hypothesis as they can be mainly attributed to behavioral
changes.
Having children may be positively associated with improved health awareness (for instance,
anecdotal evidence suggests that many adults stop smoking when they become parents), which
would imply a lower hazard for dying due to Heart Diseases.18 The same behavioral explanation
may, to some extent, also be true for infectious diseases. Fewer parents may choose life styles
which lower their defences against infections and, therefore, have a reduced mortality due to
infections because of behavioral adjustments. This is unlikely, however, to explain the whole
effect we observe for at least two reasons: (i) the group of those who work with young children
without having own children show similarly improved survival probabilities although they have
no similarly systematic reason to adjust their life style and (ii) the inclusion of an individual’s
marital state should already capture some of this behavioral effect.
Similarly, women with children may be less prone to commit suicide or to be involved in
either car accidents, or death by an homicide. All these reasons may be, at least partially, the
result of behavioral decisions that can be influenced by the fact that the women has a child.
The last result that needs clarification is the one under column (5), death by Other Diseases.
18
Willyard (2013), for instance, lists “smoking, high cholesterol, high blood pressure, diabetes, obesity and
lack of exercise” as some of the major risk factors leading to heart disease. Many of these risks are to some
extent behaviorally determined. While cessation of smoking benefits life expectancy, there are also negative
influences. Fahrenwald and Walker (2003) report that “cross-sectional studies indicate that women with
children are more sedentary than women without children.” Since they argue that “physical activity reduces
the risk not only of premature mortality, but also of coronary heart disease, hypertension, colon cancer, and
type 2 diabetes,” a sedentary lifestyle may increase overall mortality of mothers.
17
This classification is not defined in a precise way, and, we argue, it might include factors related
to infections, not reported under column (1), or again causes that can be imputed to behavioral
adaptations correlated with having children.
4.3.3
Additional robustness check: the male sample
Table 6 replicates Table 5 (both in the appendix) for the sample of males. Column (1) shows a
much stronger effect of children on mens’ hazard rate: 23% against 8%. The effect of working
with children is also much stronger when compared with the effect on women. Disaggregating
the different causes of death, columns (3) to (7), we conclude that the results under column (3)
corroborate comparable results for females. Under column (4) we find that men with children
have a lower hazard of dying by Heart diseases. For Other Causes of death, column (5), having
children apparently does not determine the hazard of dying. For males, having children doubles
the hazard of dying from an accident, Homicide or Suicide. We have no ready explanation for
this result and, again, tend to a behavioral interpretation. Finally, in column (7) ‘dying from
Errors, Open and Others,’ men face a higher, and statistically significant, hazard rate. However,
given the open definition of this category, we do not consider this critical for our main result.
When looking to the effect of working with children, one must take into account the fact that
there are relatively few men performing this job. Nevertheless, the results we have are similar
to those we found for women.
Concluding remarks
The key hypothesis of our paper, that having children makes a parent live longer, is not rejected
by the data. Since the percentage of deaths due to infectious diseases is relatively small in
developed countries (see Table 2), it would be most interesting to compare our results to those
of a complementary study for a developing country where the population percentage dying from
infections is higher. Unfortunately, it seems that such data is not readily available.19 Further
testing of our hypothesis could be done, for instance, in the wake of major immunization
programs (which should be less effective for parents than for adults without children) or on the
victims of major epidemics. Did, for instance, fewer WWI fathers die from the Spanish flu than
soldiers without children? Does it make a difference if not the case of having children versus
no children is studied but the (marginal) effects of the second, third etc child on mortality are
analyzed? Should we expect similar effects in grandparents if they are looking after their grand
children? In all cases that we considered the behavioral implications of parenthood are difficult
to fully disentangle from the hypothesized pathological transmission mechanism. Hence, we are
confident that our findings will spur vigourous debate.
19
We found that data sets comparable to that of ONS-LS are collected in the Scandinavian countries. These
seem to be, however, not accessible to outside researchers.
18
References
Berg, G., S. Gupta, and F. Portrait (2012): “Do Children Affect Life Expectancy?
A Joint Study of Early life Conditions, Fertility and Mortality,” Population Association of
America, 2010 Annual Meeting.
Berry, S. D., L. Ngo, E. J. Samelson, and D. P. Kiel (2010): “Competing Risk of
Death: An Important Consideration in Studies of Older Adults,” Journal of the American
Geriatrics Society, 58, 783–7.
Clark, T., M. Bradburn, S. Love, and D.G.Altman (2003): “Survival Analysis Part
I: Basic concepts and first analyses,” British Journal of Cancer, 89, 232–38.
Cox, D. R. (1959): “The analysis of exponentially distributed lifetimes with two types of
failure,” Journal of the Royal Statistical Society, Series B, 21, 411–21.
David, H. A. and M. L. Moeschberger (1978): The theory of competing risks, London,
UK: Griffin.
Dior, U. P., H. Hochner, Y. Friedlander, R. Calderon-Margalit, D. Jaffe,
A. Burger, M. Avgil, O. Manor, and U. Elchalal (2013): “Association between
number of children and mortality of mothers: results of a 37-year follow-up study,” Annals
of Epidemiology, 23.
Doblhammer, G. (2000): “Reproductive history and mortality later in life: A comparative
study of England and Wales and Austria,” Population Studies, 54, 169–76.
Fabian, D. and T. Flatt (2011): “The Evolution of Aging,” Nature Education Knowledge,
3, 9.
Fahrenwald, N. L. and S. N. Walker (2003): “Application of the Transtheoretical Model
of Behavior Change to the Physical Activity Behavior of WIC Mothers,” Public Health Nursing, 20, 307–17.
Fine, J. P. and R. J. Gray (1999): “A Porportional Hazards Model for the Subdistribution
of a Competing Risk,” Journal of the American Statistical Association, 94, 496–509.
Flatt, T. and D. Promislow (2007): “Physiology: Still pondering an age-old question,”
Science, 318, 1255–6.
Gagnon, A., K. R. Smith, M. Tremblay, H. Vézina, P.-P. Paré, and B. Desjardins
(2008): “Is There a Trade-off between Fertility and Longevity?” University of Western
Ontario, PSC Discussion Paper #08-05.
Gooley, T. A., W. Leisenring, J. Crowley, and B. E. Storer (1999): “Estimation
of Failure Probabilities in the Presence of Competing Risks: New Representations of old
Estimators,” Statistics in Medicine, 18, 695–706.
Graham, N. M. H., K. E. Nelson, and M. C. Steinhoff (2007): “The Epidemiology of
Acute Respiratory Infections,” in Infectious Disease Epidemiology, ed. by K. E. Nelson and
C. M. Williams, Sudbury, Mass: Jones & Bartlett, second ed.
Hayward, A. D., I. J. Rickard, and V. Lummaa (2013): “Influence of early-life nutrition on mortality and reproductive success during a subsequent famine in a preindustrial
population,” Proceedings of the National Academy of Sciences, forthcoming.
19
Helle, S. and V. Lummaa (2013): “A trade-off between having many sons and shorter
maternal post-reproductive survival in pre-industrial Finland,” Biology Letters, 9.
Helle, S., V. Lummaa, and J. Jokela (2002): “Sons reduced maternal longevity in preindustrial humans,” Science, 296, 1085.
——— (2005): “Are reproductive and somatic senescence coupled in humans? Late, but not
early, reproduction correlated with longevity in historical Sami women,” Proceedings of the
Royal Society, 272, 29–37.
Hougaard, P. (2000): Analysis of multivariate survival data, New York: Springer.
Hurt, L., C. Ronsmans, and S. Thomas (2006): “The effect of number of births on
womens mortality: systematic review of the evidence for women who have completed their
childbearing,” Population Studies, 60, 55–71.
Ikeda, A., H. Iso, H. Toyoshima, Y. Fujino, T. Mizoue, T. Yoshimura, Y. Inaba,
A. Tamakoshi, and J. Group (2007): “Marital status and mortality among Japanese
men and women: the Japan Collaborative Cohort Study,” BMC Public Health, 7.
Kalbfleisch, J. D. and R. L. Prentice (2002): The Statistical Analysis of Failure Time
Data, New York: John Wiley & Sons Ltd.
Kaplan, G., T. Wilson, R. Cohen, J. Kauhanen, M. Wu, and J. Salonen (1994):
“Social functioning and overall mortality: prospective evidence from the Kuopio Ischemic
Heart Disease Risk Factor Study,” Epidemiology, 5, 495–500.
Kirkwood, T. (1977): “Evolution of aging,” Nature, 270, 301–4.
Kirkwood, T. and S. Austad (2000): “Why do we age?” Nature, 408, 233–38.
Kuningas, M., S. Altmäe, A. Uitterlinden, A. Hofman, C. van Duijn, and
H. Tiemeier (2011): “The relationship between fertility and lifespan in humans,” Age,
33, 615–22.
Lahdenperä, M., V. Lummaa, S. Helle, M. Tremblay, and A. F. Russell (2004):
“Fitness benefits of prolonged post-reproductive lifespan in women,” Nature, 428, 178–81.
Lancaster, T. (1990): The econometric analysis of transition data, Cambridge, UK: Cambridge University Press.
Lee, E. T. and J. W. Wang (2003): Statistical Methods for Survival Data Analysis, New
York: John Wiley & Sons Ltd.
López-Otı́n, C., M. A. Blasco, L. Partridge, M. Serrano, and G. Kroemer (2013):
“The Hallmarks of Aging,” Cell, 153, 1194–217.
McArdle, P. F., T. I. Pollin, J. R. OConnell, J. D. Sorkin, R. Agarwala, A. A.
Schaäffer, E. A. Streeten, T. M. King, A. R. Shuldiner, and B. D. Mitchell
(2009): “Does Having Children Extend Life Span? A Genealogical Study of Parity and
Longevity in the Amish,” Journal of Gerontology, 61A, 190–95.
Medawar, P. B. (1946): “Old age and natural death,” Modern Quarterly, 1, 30–56.
Monto, A. S. (2002): “Epidemiology of Viral Respiratory Infections,” The American Journal
of Medicine, 112, 4S–12S.
20
Müller, H., J. Chiou, J. Carey, and J. Wang (2002): “Fertility and Lifespan: Later
Children Enhance Female Longevity,” J Gerontol A Biol Sci, 57, 202–6.
Partridge, L. and N. Barton (1993): “Optimality, mutation and the nature of ageing,”
Nature, 362, 305–11.
Prentice, R. L., J. D. Kalbfleisch, A. V. J. Peterson, N. Flournoy, V. T.
Farewell, and N. E. Breslow (1978): “The analysis of failure times in the presence of
competing risks,” Biometrics, 34, 541–54.
Sá, C., C. E. Dismuke, and P. Guimarães (2007): “Survival analysis and competing
risk models of hospital length of stay and discharge destination: the effect of distributional
assumptions,” Health Services and Outcomes Research Methodology, 7, 109–24.
Stearns, S. (1992): The Evolution of Life Histories, Oxford, UK: Oxford University Press.
Tabatabaie, V., G. Atzmon, S. N. Rajpathak, R. Freeman, N. Barzilai, and
J. Crandall (2011): “Exceptional longevity is associated with decreased reproduction,”
AGING, 3, 1,202–5.
Tollånes, M., D. Moster, A. Daltveit, and L. Irgens (2008): “Cesarean Section
and Risk of Severe Childhood Asthma: A Population-Based Cohort Study,” The Journal of
Pediatrics, 153, 112–6.
Wang, X., S. G. Byars, and S. C. Stearns (2013): “Genetic links between postreproductive lifespan and family size in Framingham,” Evolution, Medicine, and Public
Health, forthcoming.
Williams, G. (1957): “Pleiotropy, natural selection, and the evolution of senescence,” Evolution, 11, 398–411.
Willyard, C. (2013):
doi:10.1038/493S10a.
“Pathology:
At the heart of the problem,” Nature, 493,
Wood, R. G., S. Avellar, and B. Goesling (2009): Effects of Marriage on Health: A
Synthesis of Recent Research Evidence, New York: Nova Science Publishers Inc.
World Health Organization (2010): ICD-10: International statistical classification of
diseases and related health problems (10th Rev. ed.), Genvea: World Health Organization.
Appendix
21
No–Work with Children
Table 3: Health status by working with children and having children
Health status
Alive
Heart Diseases
Remaining diseases
Infections
Remaining diseases
Cancers
Remaining diseases
Other Diseases
Remaining diseases
Accidents, Hom., Suic.
Remaining diseases
Errors, Open, Others
Remaining diseases
Females
No–Children
Children
15, 931
87, 758
4, 185
9, 083
[8, 720]
[23, 236]
1, 578
3, 646
[11, 327]
[28, 673]
3, 209
9, 828
[9, 696]
[22, 491]
1, 535
3, 665
[11, 370]
[28, 654]
375
709
[12, 530]
[31, 610]
2, 023
5, 388
[10, 882]
[26, 931]
Males
No–Children
Children
9, 582
84, 633
22, 379
9, 698
[33, 954]
[19, 247]
8, 065
3, 048
[48, 268]
[25, 897]
13, 908
7, 348
[42, 425]
[21, 597]
4, 427
2, 456
[51, 906]
[26, 489]
1, 118
1, 345
[55, 215]
[27, 600]
6, 436
5, 049
[49, 897]
[23, 896]
Work with Children
Alive
963
4, 065
317
1, 933
Heart Diseases
181
114
314
153
Remaining diseases
[362]
[464]
[379]
[306]
Infections
57
37
66
27
Remaining diseases
[486]
[541]
[627]
[432]
Cancers
158
215
150
125
Remaining diseases
[385]
[363]
[543]
[334]
Other Diseases
50
71
52
49
Remaining diseases
[493]
[507]
[641]
[410]
Accidents, Hom., Suic.
11
16
14
17
Remaining diseases
[532]
[562]
[679]
[442]
Errors, Open, Others
86
125
97
88
Remaining diseases
[457]
[453]
[596]
[371]
Health status indicates if the individual is alive, or the cause of death. Source: ONS-LS.
22
Table 4: Semiparametric and parametric analysis – Cox, Exponential and Weibull regressions – females
23
Risk = Infections
Risk = Other diseases
Model (1)
Model (2)
Model (3)
Model (4)
Variables
Cox
Exponential Weibull
Cox
Exponential Weibull
Cox
Exponential Weibull
Cox
Exponential Weibull
Young
0·971
0·690∗∗∗ 0·973
0·789∗∗∗
0·499∗∗∗ 0·787∗∗∗
0·974∗∗
0·715∗∗∗ 0·973∗∗
0·962∗∗∗
0·698∗∗∗ 0·959∗∗∗
(0·036)
(0·025)
(0·036)
(0·028)
(0·017)
(0·028)
(0·013)
(0·009)
(0·013)
(0·013)
(0·009)
(0·013)
WYoung
0·778∗∗
0·854
0·787∗∗
0·768∗∗
0·889
0·767∗∗
0·940∗
1·033
0·943∗
0·924∗∗
1·031
0·929∗∗
(0·083)
(0·091)
(0·084)
(0·082)
(0·094)
(0·082)
(0·031)
(0·033)
(0·031)
(0·030)
(0·033)
(0·030)
0·992
0·790∗∗∗ 0·961
0·729∗∗∗
0·693∗∗∗ 0·676∗∗∗
1·006
0·854∗∗∗ 0·974∗
0·982
0·839∗∗∗ 0·948∗∗∗
Married
(0·038)
(0·029)
(0·037)
(0·028)
(0·025)
(0·025)
(0·014)
(0·012)
(0·014)
(0·014)
(0·012)
(0·013)
Occupation
0·773∗∗∗
0·406∗∗∗ 0·777∗∗∗
0·620∗∗∗
0·273∗∗∗ 0·613∗∗∗
0·855∗∗∗
0·476∗∗∗ 0·857∗∗∗
0·831∗∗∗
0·457∗∗∗ 0·833∗∗∗
(0·024)
(0·013)
(0·024)
(0·020)
(0·009)
(0·019)
(0·009)
(0·005)
(0·009)
(0·009)
(0·005)
(0·009)
0·611∗∗∗
0·501∗∗∗ 0·614∗∗∗
0·508∗∗∗
0·346∗∗∗ 0·508∗∗∗
0·731∗∗∗
0·604∗∗∗ 0·736∗∗∗
0·703∗∗∗
0·579∗∗∗ 0·707∗∗∗
House
(0·018)
(0·015)
(0·018)
(0·015)
(0·010)
(0·014)
(0·008)
(0·007)
(0·008)
(0·007)
(0·006)
(0·007)
Log likelihood -52816
-20734
-8575
-47427
-17187
-3237
-422420
-82346
-7881
-417980
-79690
-3586
LR test
481***
2,558***
475***
1,316***
5,533*** 1,434*** 1,407*** 12,079*** 1,398*** 1,863*** 13,709*** 1,879***
161·04∗∗∗
349·94∗∗∗
606·63∗∗∗
642·86∗∗∗
PH test
∗∗∗
∗∗∗
∗∗∗
2·471
2·549
2·154
2·170∗∗∗
ln(p)
Observations
155,062
114,035
155,062
149,744
5,318
5,318
41,027
41,027
Failures
Significance levels: *: 10%, **: 5%, ***: 1%. The dependent variable is age. Standard errors in parentheses. Under each model we report the hazard ratio.
The estimation procedure is defined in each column. Risk = Infections – the failure is defined as death by an infection; Risk = Other diseases – the failure is
defined as death by other causes besides infection. Model (1) – death by other causes is defined as censored; Model (2): deaths by other causes are dropped
from the sample; Model (3) – death by infection is defined as censored; Model (4) deaths by infection are dropped from the sample. Source: ONS-LS.
Table 5: Competing risks analysis – females
Definition A
(1)
(2)
24
Definition B
(5)
(6)
(7)
Variables
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
Young
0·923∗∗
0·966∗∗∗
0·876∗∗∗
0·867∗∗∗ 0·995
0·985
0·911∗∗
0·903∗∗∗ 0·770∗∗∗ 0·760∗∗∗ 1·015
1·006
(0·033)
(0·013)
(0·020)
(0·019)
(0·024)
(0·024)
(0·033)
(0·033)
(0·059)
(0·058)
(0·030)
(0·030)
0·792∗∗
0·953∗
0·962
0·947
1·000
0·986
0·868
0·855∗
0·797
0·790
0·945
0·930
WYoung
(0·083)
(0·028)
(0·055)
(0·054)
(0·053)
(0·052)
(0·081)
(0·080)
(0·156)
(0·157)
(0·067)
(0·066)
0·862∗∗∗
0·976∗
0·799∗∗∗
0·785∗∗∗ 1·085∗∗∗ 1·064∗∗
0·907∗∗
0·893∗∗∗ 0·610∗∗∗ 0·598∗∗∗ 0·942∗
0·929∗∗
Married
(0·032)
(0·013)
(0·019)
(0·018)
(0·029)
(0·028)
(0·035)
(0·034)
(0·048)
(0·047)
(0·030)
(0·030)
Occupation
0·749∗∗∗
0·858∗∗∗
0·683∗∗∗
0·663∗∗∗ 0·904∗∗∗ 0·878∗∗∗ 0·852∗∗∗ 0·832∗∗∗ 0·757∗∗∗ 0·733∗∗∗ 1·040
1·018
(0·024)
(0·009)
(0·013)
(0·013)
(0·017)
(0·017)
(0·026)
(0·025)
(0·051)
(0·050)
(0·026)
(0·025)
0·699∗∗∗
0·772∗∗∗
0·774∗∗∗
0·739∗∗∗ 0·776∗∗∗ 0·742∗∗∗ 0·837∗∗∗ 0·805∗∗∗ 0·847∗∗
0·815∗∗∗ 0·982
0·944∗∗
House
(0·020)
(0·008)
(0·014)
(0·013)
(0·015)
(0·014)
(0·025)
(0·024)
(0·057)
(0·055)
(0·025)
(0·024)
Wald χ2 (5)
404.92*** 1,154.38*** 1,218.05*** 1,511.00*** 267.22*** 387.92*** 122.52*** 168.90*** 153.52*** 175.55***
7.46
14.84**
Observations
155,062
155,062
155,062
149,744
155,062
149,744
155,062
149,744
155,062
149,744
155,062
149,744
No. Failures
5,318
41,027
13,563
13,563
13,410
13,410
5,321
5,321
1,111
1,111
7,622
7,622
5,318
32,782
27,464
32,935
27,617
41,024
35,706
45,234
39,916
38,723
33,405
No. Competing 41,027
Significance levels: *: 10%, **: 5%, ***: 1%. The dependent variable is age. The models are estimated by competing risks procedures. Standard errors
in parentheses. Definition A: the cause of death is defined in two categories - (1) infection as the main risk, and other causes as the competing risk,
and (2) the reverse, where the main risk is other causes of death. Definition B: other causes of death are split in (3) Heart Diseases, (4) Cancers, (5)
Other Diseases, (6) Accidents, Hom., Suic., and (7) Errors, Open, Others. In columns (a) we keep the full set of observations, implying that for each
alternative cause of death infection becomes a competing risk. In columns (b) we drop the observations corresponding to infections when analyzing a
particular risk of death. Source: ONS-LS.
(3)
(4)
Table 6: Competing risks analysis – males
Definition A
(1)
(2)
25
Definition B
(5)
(6)
(7)
Variables
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
Young
0·775∗∗∗
1·220∗∗∗ 0·770∗∗∗
0·742∗∗∗ 0·929∗∗∗ 0·898∗∗∗ 1·034
1·002
2·013∗∗∗
1·908∗∗∗ 1·483∗∗∗ 1·446∗∗∗
(0·017)
(0·010)
(0·009)
(0·009)
(0·014)
(0·013)
(0·027)
(0·026)
(0·090)
(0·085)
(0·028)
(0·027)
0·748∗∗∗
0·923∗∗∗ 0·985
0·962
0·887∗∗
0·866∗∗
0·935
0·915
0·925
0·910
0·932
0·912
WYoung
(0·078)
(0·025)
(0·045)
(0·043)
(0·054)
(0·052)
(0·094)
(0·092)
(0·169)
(0·166)
(0·069)
(0·067)
0·801∗∗∗
0·894∗∗∗ 1·076∗∗∗
1·036∗∗
1·269∗∗∗ 1·230∗∗∗ 0·671∗∗∗ 0·648∗∗∗
0·246∗∗∗
0·242∗∗∗ 0·742∗∗∗ 0·715∗∗∗
Married
(0·019)
(0·009)
(0·017)
(0·016)
(0·026)
(0·025)
(0·021)
(0·020)
(0·011)
(0·011)
(0·019)
(0·018)
Occupation
0·725∗∗∗
0·967∗∗∗ 0·968∗∗∗
0·923∗∗∗ 0·913∗∗∗ 0·874∗∗∗ 1·028
0·989
0·752∗∗∗
0·727∗∗∗ 1·072∗∗∗ 1·031
(0·016)
(0·007)
(0·012)
(0·011)
(0·014)
(0·013)
(0·027)
(0·026)
(0·035)
(0·034)
(0·022)
(0·021)
0·750∗∗∗
0·843∗∗∗ 0·884∗∗∗
0·838∗∗∗ 0·803∗∗∗ 0·763∗∗∗ 0·941∗∗∗ 0·902∗∗∗
0·747∗∗∗
0·710∗∗∗ 1·116∗∗∗ 1·070∗∗∗
House
(0·015)
(0·007)
(0·011)
(0·010)
(0·012)
(0·011)
(0·025)
(0·024)
(0·035)
(0·033)
(0·023)
(0·022)
Wald χ2 (5)
1,103.31*** 1,122.84*** 731.79*** 1,215.56*** 497.05*** 766.20*** 182.91*** 247.90*** 1166.95*** 1245.59*** 564.25*** 492.96**
Observations
182,895
182,895
182,895
171,689
182,895
171,689
182,895
171,689
182,895
171,689
182,895
171,689
No. Failures
11,206
75,224
32,544
32,544
21,531
21,531
6,984
6,984
2,494
2,494
11,670
11,670
11,206
53,886
42,680
64,889
53,693
79,446
68,240
83,936
72,730
74,760
63,554
No. Competing 75,224
Significance levels: *: 10%, **: 5%, ***: 1%. The dependent variable is age. The models are estimated by competing risks procedures. Standard errors
in parentheses. Definition A: the cause of death is defined in two categories - (1) infection as the main risk, and other causes as the competing risk,
and (2) the reverse, where the main risk is other causes of death. Definition B: other causes of death are split in (3) Heart Diseases, (4) Cancers, (5)
Other Diseases, (6) Accidents, Hom., Suic., and (7) Errors, Open, Others. In columns (a) we keep the full set of observations, implying that for each
alternative cause of death infection becomes a competing risk. In columns (b) we drop the observations corresponding to infections when analyzing a
particular risk of death. Source: ONS-LS.
(3)
(4)
Most Recent Working Paper
NIPE WP
18/2013
NIPE WP
17/2013
NIPE WP
16/2013
NIPE WP
15/2013
NIPE WP
14/2013
NIPE WP
13/2013
NIPE WP
12/2013
NIPE WP
11/2013
NIPE WP
10/2013
NIPE WP
09/2013
NIPE WP
08/2013
NIPE WP
07/2013
NIPE WP
06/2013
NIPE WP
05/2013
NIPE WP
04/2013
NIPE WP
03/2013
NIPE WP
02/2013
NIPE WP
01/2013
NIPE WP
27/2012
NIPE WP
26/2012
NIPE WP
25/2012
NIPE WP
24/2012
NIPE WP
23/2012
NIPE WP
22/2012
NIPE WP
21/2012
NIPE WP
20/2012
Portela, Miguel e Paul Schweinzer, “The Parental Co-Immunization Hypothesis”, 2013
Martins, Susana e Francisco José Veiga, “Government size, composition of public expenditure,
and economic development”, 2013
Bastos, Paulo e Odd Rune Straume, “Preschool education in Brazil: Does public supply crowd
out private enrollment?”, 2013
Martins, Rodrigo e Francisco José Veiga, “Does voter turnout affect the votes for the incumbent
government?”, 2013
Aguiar-Conraria, Luís, Pedro C. Magalhães e Christoph A. Vanberg, “Experimental evidence
that quorum rules discourage turnout and promote election boycotts”, 2013
Silva, José Ferreira, J. Cadima Ribeiro, “As Assimetrias Regionais em Portugal: análise da
convergência versus divergência ao nível dos municípios”, 2013
Faria, Ana Paula, Natália Barbosa e Vasco Eiriz, “Firms’ innovation across regions: an
exploratory study”, 2013
Veiga, Francisco José, “Instituições, Estabilidade Política e Desempenho Económico
Implicações para Portugal”, 2013
Barbosa, Natália, Ana Paula Faria e Vasco Eiriz, “Industry- and firm-specific factors of
innovation novelty”, 2013
Castro, Vítor e Megumi Kubota, “Duration dependence and change-points in the likelihood of
credit booms ending”, 2013
Monteiro, Natália Pimenta e Geoff Stewart “Scale, Scope and Survival: A Comparison of
Cooperative and Capitalist Modes of Production”, 2013
Esteves, Rosa-Branca e Joana Resende, “Competitive Targeted Advertising with Price
Discrimination”, 2013
Barbosa, Natália, Maria Helena Guimarães e Ana Paula Faria, “Single Market noncompliance: how relevant is the institutional setting?”, 2013
Lommerud, Kjell Erik, Odd Rune Straume e Steinar Vagstad, “Mommy tracks and public
policy: On self-fulfilling prophecies and gender gaps in promotion”, 2013
Brekke, Kurt R., Luigi Siciliani e Odd Rune Straume, “Hospital Mergers: A Spatial
Competition Approach”, 2013
Faria, Ana Paula e Natália Barbosa, “Does venture capital really foster innovation?”, 2013
Esteves, Rosa Branca, “Customer Poaching with Retention Strategies”, 2013
Aguiar-Conraria, Luís, Teresa Maria Rodrigues e Maria Joana Soares, “Oil Shocks and the
Euro as an Optimum Currency Area”, 2013
Ricardo M. Sousa, “The Effects of Monetary Policy in a Small Open Economy: The Case of
Portugal” 2012
Sushanta K. Mallick e Ricardo M. Sousa, “Is Technology Factor-Neutral? Evidence from the US
Manufacturing Sector” 2012
Jawadi, F. e Ricardo M. Sousa, “Structural Breaks and Nonlinearity in US and UK Public Debt”
2012
Jawadi, F. e Ricardo M. Sousa, “Consumption and Wealth in the US, the UK and the Euro Area:
A Nonlinear Investigation” 2012
Jawadi, F. e Ricardo M. Sousa, “ Modelling Money Demand: Further Evidence from an
International Comparison” 2012
Jawadi, F. e Ricardo M. Sousa, “ Money Demand in the euro area, the US and the UK:
Assessing the Role of Nonlinearity” 2012
Agnello, L, Sushanta K. Mallick e Ricardo M. Sousa, “Financial Reforms and Income
Inequality” 2012
Agnello, L, Gilles Dufrénot e Ricardo M. Sousa, “Adjusting the U.S. Fiscal Policy for Asset
Prices: Evidence from a TVP-MS Framework t” 2012