Population-Wide Analysis of Differences in Disease Progression Patterns in Men and Women
Population-Wide Analysis of Differences in Disease Progression Patterns in Men and Women
Population-Wide Analysis of Differences in Disease Progression Patterns in Men and Women
https://doi.org/10.1038/s41467-019-08475-9 OPEN
longitudinal differences in hospital admissions between men and women is needed. Here, we
demonstrate a systematic analysis of all diseases and disease co-occurrences in the complete
Danish population using the ICD-10 and Global Burden of Disease terminologies. Incidence
rates of single diagnoses are different for men and women in most cases. The age at first
diagnosis is typically lower for men, compared to women. Men and women share many
disease co-occurrences. However, many sex-associated incongruities not linked directly to
anatomical or genomic differences are also found. Analysis of multi-step trajectories uncover
differences in longitudinal patterns, for example concerning injuries and substance abuse,
cancer, and osteoporosis. The results point towards the need for an increased focus on sex-
stratified medicine to elucidate the origins of the socio-economic and ethological differences.
1 Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark.
2 Unit of Clinical Pharmacology, Roskilde University Hospital, 4000 Roskilde, Denmark. 3 Institute for Genomics and Bioinformatics and Department of
Computer Science, University of California, Irvine, CA 92697, USA. Correspondence and requests for materials should be addressed to
S.B. (email: [email protected])
S
ex- and gender-stratified medicine is an essential aspect of (Supplementary Figure 1A). Considering the age of first hospital
precision medicine. Sex and gender affect the manifestation diagnosis we found 986 diagnoses in which the age was different
and pathophysiology of many diseases1–3. Sex is defined as for men or women (Welch’s t test, FDR < 0.05) (see Supple-
the biological component, while gender is a social construction as mentary Data 2 for mean values and 95% confidence intervals
for example defined by the WHO4. Sex is a separate risk factor (CI)). We noticed that in the majority of cases, women were, on
even when all other aspects have been taken into account5–7. average, diagnosed at an older age than men (Fig. 1b, c). The only
Although sex is an important aspect of disease, many sex-specific exceptions were neoplasms (ch2), blood and immune system
analyses focus on one sex only and less on the comparative diseases (ch3), and genitourinary system diseases (ch14). From
aspect8. Consequently, sex- and gender-medicine is generally the analysis using the GBD categories we also found that men
understudied, and an increasing body of literature stresses the were, in the majority of the cases, diagnosed at a younger age
need to include both sexes in animal models, clinical trials, and compared to women (Supplementary Figure 1B, C).
healthcare planning policies8–12. Men and women are affected
differently by disease, such as cardiovascular diseases, osteo-
porosis, and autoimmune diseases2,3,13–16. Furthermore, many Diagnosis co-occurrence. Following frequency-based filtering,
prior studies also indicate a bias in diagnosis and treatment, for we analyzed 27,185 diagnosis co-occurrences, including both sex
example that osteoporosis is underdiagnosed in men, while and non-sex-specific diagnoses (Fig. 2). In the analysis, we
chronic obstructive lung disease is underdiagnosed in adjusted for a number of common confounding factors, including
women16,17. Although earlier studies point to clear sex-specific age, admission type, hospitalization month, and year by selecting
differences in a number of disease states, they have not yet been a matched comparison group. Modifying disease definitions and
complemented by multimorbidity studies that incorporate co- diagnostic criteria may affect both incidence and prevalence25.
occurrence of other conditions in a systematic manner. Some co- Previous studies found that changes in diagnostic criteria
occurring conditions display a consistent temporal progression increased the hospitalization rate for e.g. acute myocardial
trend. However, cross-sectional studies are time-unresolved and infarction (AMI), and increased the prevalence and shifted the
most cohort studies define a priori the temporal association age of diagnosis for autism26,27. The criteria for hospitalization
between conditions when testing a specific hypothesis, and thus year and month in our scheme negate this type of effect as well as
do not take into account the order in which conditions are any seasonal influence, which may change the incidence of, for
observed in clinical care. Nonetheless, the etiology and outcome instance, infectious diseases. The Methods section contains a
of single conditions will very often be related to their temporal detailed account on the statistical model (see Supplementary
context in terms of other conditions18–20. A temporal trend is a Data 3 for estimates and 95% BCI of relative risks and
prerequisite for causality and should systematically be taken into directionality).
consideration when studying patient-specific co-occurrences of We found 12,122 directional pairs (defined as diagnosis co-
conditions21,22. occurrences that had an elevated relative risk and preferred
Incidence and temporality in diagnosis co-occurrence have statistical direction), when calculating the sex-adjusted RR.
been studied previously, but the focus has not been centered on Remarkably, 4155 directional pairs (2055 in men and 2100 in
sex-stratified differences20,23. We now present a retrospective women) were not common to both men and women. Hence, 4155
cohort study based on the population-wide Danish National directional pairs are driven purely by one sex. This finding could
Patient Registry (NPR), where we examine sex-specific incidence, be a result of a lack of power to detect the direction in either sex,
risk, and temporal aspects of diagnoses and co-occurrence of but an analysis of the number of men and women diagnosed with
diagnosis related to disease and symptoms. Our findings indicate the 12,122 pairs showed a high correlation (ρ = 0.861, 95% CI
large discrepancies across all areas of disease. 0.857–0.863, Pearson correlation) (Supplementary Figure 2). For
the 4155 directional pairs only, the correlation coefficient
decreased slightly (ρ = 0.799, 95% CI 0.788−0.81, Pearson
Results correlation).
Diagnosis incidence and relation to age. We analyzed hospital We performed a separate analysis of the excluded dagger
admissions from 6,909,676 patients (the whole Danish population −asterisk pairs and found that, overall a dagger code, the etiology,
during a 21-year period), of which 48.2% were women. We precedes an asterisk code, the manifestation (Supplementary
analyzed the incidence rate of 1369 ICD-10 level 3 diagnoses for Note 2).
men and women. A complementary analysis using the Global When taking sex into account, we found 9547 directional pairs
Burden of Disease (GBD) categories can be found in Supple- in men and 10,380 directional pairs in women, respectively. Of
mentary Note 1. Incidence rates may be biased by age; thus we these 6885 were shared leaving 2662 and 3495 unique pairs,
calculated the age-adjusted incidence rate (AIR) using the Euro- respectively (reduced to 2514 and 2660 when not including sex-
stat 2013 standard population24. The Methods section contains a specific diagnoses). We examined the strength in directionality of
detailed account of the statistical model employed. We found that the 6885 shared pairs (Supplementary Figure 3). We found that
344 and 473 diagnoses had a higher AIR in women and men, the variances of the two distributions were not equal, and that the
respectively (see Supplementary Data 1 for estimates and 95% distribution for women had a larger variance using both the ICD-
Bayesian Credible Intervals (BCI)). Differences in incidence rates 10 (F = 0.8269, 95% CI 0.79−0.87, F test) and GBD (F = 0.52,
were not limited to a few particular disease areas, but distributed 95% CI 0.42–0.66) terminologies. We noted that the distribution
across the 18 ICD-10 chapters studied (Fig. 1a). Nonetheless, for women was skewed towards positive values, indicating a
some ICD-10 chapters such as infectious diseases (ch1), neo- weaker trend in directionality compared to the sex-adjusted
plasms (ch2), circulatory system diseases (ch9), respiratory dis- directionality overall. We also found that the majority of
eases (ch10), perinatal conditions (ch16), and injuries (ch19) had directional pairs included a nonchronic diagnosis, even when
a higher AIR in men, on average. Conversely, endocrine and excluding the symptoms and injuries chapter (Supplementary
metabolic disorders (ch4), eye and adnexa diseases (ch7), skin Table 1).
diseases (ch12), musculoskeletal diseases (ch13), and congenital To obtain an overview of the anatomical and functional
malformations (ch17) had a higher AIR in women, on average. A differences between men and women in terms of the directional
very similar pattern was observed when using the GBD categories pairs identified, we investigated the distribution over the 18 ICD-
a b 0.5
19
18
0.4
17
16 0.3
Density
14
13 0.2
12
0.1
ICD−10 Chapter
11
10 0.0
9 0.5
8
0.4
7
6 0.3
Density
5
4 0.2
3
0.1
2
1 0.0
−2 −1 0 1 2 0 25 50 75 100
Difference in incidence rate Mean age at first hospital diagnosis
c
19 I Certain infectious and parasitic diseases X Diseases of the respiratory system
18
II Neoplasms XI Diseases of the digestive system
17
Diseases of the blood and blood-forming
16 III organs and certain disorders involving the XII Diseases of the skin and subcutaneous tissue
immune mechanism
14 Diseases of the musculoskeletal system and
IV Endocrine, nutritional and metabolic diseases XIII
connective tissue
13
12 V Mental and behavioural disorders XIV Diseases of the genitourinary system
ICD−10 Chapter
5
4
3
2
1
Fig. 1 Incidence and age at first hospital diagnosis of 1369 diagnoses. a 344 and 437 diagnoses were found to have a higher age-adjusted incidence rate in
men and women, respectively. b Mean age at first diagnosis for each of the 1369 diagnoses studied. c Mean of the difference in age at first diagnosis. We
found 963 diagnoses in which the age at first diagnosis was statistically significant when comparing men and women (Welch’s t test, FDR < 0.05). Errors
bars are the standard error of the mean per ICD-10 chapter
10 chapters. We only included directional pairs identified that diagnosed prior to infectious diseases (ch1), and the opposite was
were unique to one sex and at the same time did not include a found using the GBD.
sex-specific diagnosis (Fig. 3). We found that diagnosis pairs from Seven combinations of chapters were found to be unequally
perinatal conditions (ch16) and congenital malformations (ch17) represented, hereof five overrepresented in women (FDR ≤ 0.05,
were preferentially diagnosed first in both men and women, with Fisher’s exact test) (Supplementary Table 2). Diagnoses related to
the exception of “neoplasms (ch2) and congenital malformations” “neoplasms (ch2) and digestive system diseases (ch11)”, and
in women. Nonetheless, there were also incongruities, such as diagnoses regarding injuries (ch19) were overrepresented in men.
“genitourinary system diseases (ch14) and infections (ch1)”, and Diagnoses related to “infectious diseases (ch1) and musculoske-
“neoplasms (ch2) and digestive system diseases (ch11)”. Using the letal diseases (ch13)”, “neoplasms (ch2) and circulatory system
GBD terminology we noticed one case in which infectious diseases (ch9)”, “respiratory diseases (ch10) and signs and
diseases (ch1) were diagnosed prior to mental disorders (ch5) symptoms (ch18)”, “musculoskeletal diseases (ch13) and signs
(Supplementary Figure 4). This was, in fact, opposite to what the and symptoms (ch18)”, and “musculoskeletal diseases (ch13) and
analysis using the ICD-10 terminology indicated. In men, the circulatory system diseases (ch9)” were overrepresented in
ICD-10 terminology indicated that skin diseases (ch12) were women.
213 dagger-asterisk
pairs removed
236,200 diagnosis pairs
Fig. 2 Diagnosis co-occurrences found in population-wide data from 6,909,676 patients. 951,509 ICD-10 level 3 diagnosis pairs were found to occur in the
population; of these, a large number were filtered out due to low frequency (N < 100), dagger−asterisk combinations, or due to not passing the crude
estimate of the relative risk. The standard method for calculating a confidence interval was applied in the prescreening section. Post-filtering 27,185
diagnosis pairs remained comprising 1360 unique diagnoses. Of these, 275 pairs involved a male-specific diagnosis and 1402 a female-specific diagnosis
Risk factors, in this case an earlier diagnosis, may predispose higher risk of lower respiratory infections and other respiratory
men and women to some diseases unequally. We found 939 pairs disorders (Supplementary Data 7).
where the relative risk of a future diagnosis was higher in one sex, Inspecting the median time difference between the first
compared to the other. We only examined pairs in which more occurrences, we found that there were 1181 directional pairs in
than five men or women had been diagnosed with the two which the timespan was different in men and women (FDR ≤
diagnoses in the preferred statistical direction. In 517 cases, 0.05, Mann−Whitney U test) (Supplementary Figure 5B). In 851
women were at a higher risk, while men in 422 cases were at a of these, the time-spans between the two diagnoses were higher in
higher risk (Supplementary Figure 5A). We identified several women, compared to men. Here three chapters were over-
inconsistencies, such as “mental disorders (ch5) and neoplasms represented: respiratory diseases (ch10) in women, and circula-
(ch2)”, in which the overall trend for the chapters were in the tory system diseases (ch9) and injuries (ch19) in men
opposite order. When we examined the distribution of ICD-10 (Supplementary Table 4, Supplementary Figure 6B).
chapters to which the event, i.e. the diagnosis following the At the extreme, the temporal relationship between exposure
exposure, belonged, we found nine chapters that were unevenly and event (e.g. diagnosis A and diagnosis B) may be reversed for
represented: endocrine and metabolic disorders (ch4), mental men and women. This reversal could point to physiological or
disorders (ch5), eye and Adnexa diseases (ch7), digestive system etiological differences or may also reflect diagnostic biases within
diseases (ch11), skin diseases (ch12), and musculoskeletal diseases the healthcare system. For example, our overall analysis indicated
(ch13) in men, while ear and mastoid diseases (ch8), respiratory that ischemic heart disease (IHD, I25) precedes paroxysmal
diseases (ch10), and genitourinary system diseases (ch14) were tachycardia (PT, I47). While this pattern holds for men, it is
overrepresented in women (FDR ≤ 0.05, Fisher’s exact test) reversed in women; IHD precedes PT in men, and PT precedes
(Supplementary Table 3, Supplementary Figure 6A). We IHD in women. Thus, men mediate the observed order of
compared 302 of these findings to earlier reports by searching occurrence at a population-wide level (Fig. 4a). We identified 15
for mentions of both ICD-10 terms in PubMed and Google pairs using the ICD-10 terminology and one pair using the GBD
Scholar. Full text articles were inspected for evidence or mentions terminology in which this reversal occurs, according to our
of sex-specific risk. In total, we found solid evidence for 42 co- criteria (Table 1). In ten cases, there were no preferred statistical
occurrences in which there had been reported a difference direction at the population level, while the sex-specific preferred
between men and women (Supplementary Dataset 9). Of these, statistical direction was reversed. In the remaining five cases, the
33 articles agreed with our findings, five provided only weak overall preferred statistical direction corresponded to the trend in
evidence by mentions of sex as a risk factor and no quantitative men. In some cases, the pairs involved a chronic disease and a
estimate or reference. Four articles reported opposite conclusions. complication of this disease. Men were diagnosed with abscess of
These four articles were based on cohort sizes ranging from 83 to anal and rectal regions (K61) followed by Crohn’s disease (K50)
74,020 individuals. We noticed that the directional pairs with the in 56 out of 100 cases, where women were diagnosed in the same
largest difference in relative risk from the GBD analysis was order in 44 out of 100 cases (Fig. 4b). Eight of the reversed pairs
centered on substance abuse and retroviral diseases, and disorders describe conditions related to the bladder and kidney. From the
of psychological development. Additionally, we found that men GBD analysis we identified one relationship in which the order of
with chronic obstructive pulmonary disease (COPD) were at a diagnosis was reversed, namely “pancreatitis” and “gallbladder
12 1 17 Directionality
2 1.00
47 6 8 63
0.75
5 7 6 3 11 5 5
0.50
26 3 11 15 4 9 17 9 25
0.25
37 13 5 10 5 10 7 4 10 21
0.00
20 21 9 1 20 7 6 11 6 2 12 10 21
39 8 3 9 4 5 3 7 3 5 1 3 4 14 34
10 1 3 3 1 1 8 1 3 3 1 5
57 1 13 18 8 25 4 16 2 9 9 1 6 50 10 9 6 63
42 21 3 5 6 23 6 10 27 13 10 14 20 3 9 12 12 3 2 9 44
73 14 10 1 6 16 37 14 18 19 11 18 57 16 33 29 11 2 30 16 61
28 9 6 7 1 1 5 9 2 12 14 12 14 3 15 9 3 4 1 14 8 12 15
88 22 18 15 22 7 23 12 20 8 6 22 13 2 6 3 8 7 22 4 1 19 4 5 44
35 9 11 12 5 16 1 9 7 9 8 20 12 14 21 24 9 21 7 13 3 28 2 15 12 11 37
1 8 3 2 2 2 2 1 16 3 5 16 11 12 7 7 7 1
5 2 4 21 2 1 6 11 4 4 7 2 3 1 4 4 17 4 1 2 4 6 6 5 7 7 4 3 8 10 14
41 19 1 31 58 7 62 74 53 3 8 54 70 16 14 44 28 18 16 45 13 38 42 29 2 29 23 51 10 12 44 11 20 30
93 70 3 16 56 14 53 27 16 5 3 29 137 22 6 13 22 19 19 11 8 12 112 42 17 21 28 41 30 38 20 8 59 142
19 18 17 16 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19
Fig. 3 Temporal diagnosis co-occurrence across ICD-10 chapters. The distribution of 3186 and 3721 temporal diagnosis co-occurrences across ICD-10
chapters in men and women, respectively (non-sex-specific diagnoses). The color scale indicates the percentage of the pairs that has the temporal
directionality from the horizontal chapter to the vertical chapter. Numbers in the boxes indicate the breakdown of the overall co-occurrence figures
a b
θI25->I47 = 0.45 [0.44–0.46] θI25->I47 = 0.59 [0.58–0.60] θK61->K50 = 0.44 [0.39–0.48] θK61->K50= 0.56 [0.52–0.60]
RRI25->I47 = 0.96 [0.92–1.01] RRI25->I47 = 1.38 [1.33–1.43] RRK61->K50 = 3.78 [3.20–4.42] RRK61->K50 = 3.18 [2.74–3.71]
3926 8206 400 643
Fig. 4 Opposite temporal relationships in men and women. a At the population level, paroxysmal tachycardia (I47) is observed to be a complication of
ischemic heart disease (I25). The sex-stratified analysis showed that this pattern only existed in men, and that the reversed pattern was significant in
women. b At the population level, there was no preferred direction of diagnoses between Crohn’s disease (K61) and abscesses of anal and rectal regions
(K50). However, the sex-stratified analysis found that the directionality was reversed between men and women
and biliary diseases”. In men, “pancreatitis” was diagnosed prior ten directional pairs (defined as a diagnosis co-occurrence with
to “gallbladder and biliary diseases”, whereas the reverse was increased relative risk and preferred statistical direction) with the
found in women. The directionality observed at the population largest difference in relative risk between men and women. We
level corresponded to that in men. found 230 linear diagnosis trajectories containing at least four
diagnoses (followed by at least 100 patients) (Table 2). Clustering
the individual trajectories together into one trajectory network
Diagnosis trajectories. Piecing together individual directional that display one directional pair only once, we noticed that there
pairs may point towards overseen patterns and sex-related dif- were large disparities in diagnoses related to cancers, injuries, and
ferences in a more extended temporal context. One framework drug and alcohol abuse (Supplementary Fig. 7). Several diagnoses
for investing this is diagnosis trajectories19. We investigated the related to fractures and injuries lead to or from alcohol abuse-
A B Name Name N, N, RR, men RR, women Direction, men Direction, women
Men Women
R33 N30 Retention of urine Cystitis 24,519 8214 2.18 [2.12–2.24] 1.82 [1.74–1.92] 0.61 [0.60–0.62] 0.42 [0.41–0.44]
I48 I47 Atrial fibrillation and Paroxysmal 22,648 18,865 3.29 [3.18–3.40] 2.53 [2.46–2.61] 0.56 [0.55–0.57] 0.48 [0.47–0.49]
flutter tachycardia
R33 R31 Retention of urine Unspecified 21,612 2302 1.66 [1.61–1.71] 1.17 [1.07–1.26] 0.52 [0.51–0.53] 0.45 [0.42–0.48]
hematuria
K30 R10 Dyspepsia Abdominal 15,145 30,544 1.32 [1.28–1.36] 1.41 [1.38–1.45] 0.53 [0.52–0.54] 0.47 [0.47–0.48]
and pelvic
pain
I25 I47 Chronic ischemic heart Paroxysmal 13,882 8748 1.38 [1.33–1.43] 1.17 [1.12–1.22] 0.59 [0.58–0.60] 0.45 [0.44–0.46]
disease tachycardia
N31 N30 Neuromuscular Cystitis 3551 3158 2.35 [2.18–2.53] 2.53 [2.32–2.76] 0.55 [0.53–0.57] 0.46 [0.44–0.49]
dysfunction of bladder,
not elsewhere
classified
H26 H27 Other cataract Other 1000 1376 3.74 [3.23–4.30] 2.03 [1.82–2.25] 0.60 [0.55–0.64] 0.40 [0.37–0.44]
disorders of
lens
R33 N39 Retention of urine Other 16,860 7012 2.24 [2.16–2.32] 1.70 [1.61–1.79] 0.66 [0.65–0.67] 0.44 [0.42–0.45]
disorders of
urinary
system
J34 J32 Other disorders of Chronic 2330 1538 3.36 [3.00–3.77] 6.47 [5.65–7.45] 0.54 [0.51–0.57] 0.41 [0.38–0.45]
nose and nasal sinuses sinusitis
R31 N10 Unspecified hematuria Acute 2486 2345 1.61 [1.48–1.76] 1.33 [1.21–1.45] 0.57 [0.54–0.60] 0.44 [0.41–0.47]
tubulo-
interstitial
nephritis
K61 K50 Abscess of anal and Crohn’s 1136 916 3.18 [2.74–3.71] 5.17 [4.35–6.16] 0.56 [0.52–0.60] 0.44 [0.39–0.48]
rectal regions disease
[regional
enteritis]
N32 N39 Other disorders of Other 1500 3212 1.80 [1.61–2.00] 3.72 [3.41–4.07] 0.57 [0.54–0.61] 0.41 [0.39–0.44]
bladder disorders of
urinary
system
R39 N39 Other symptoms and Other 12,044 6109 1.58 [1.51–1.65] 2.22 [2.10–2.36] 0.54 [0.53–0.55] 0.39 [0.37–0.40]
signs involving the disorders of
urinary system urinary
system
R33 R32 Retention of urine Unspecified 3189 3199 1.72 [1.58–1.86] 2.31 [2.13–2.50] 0.55 [0.53–0.58] 0.40 [0.37–0.42]
urinary
incontinence
S00 S02 Superficial injury of Fracture of 33,893 15,328 1.37 [1.34–1.40] 1.53 [1.48–1.58] 0.53 [0.52–0.53] 0.47 [0.46–0.49]
head skull and
facial bones
B5.9 B5.8 Pancreatitis Gallbladder 2441 832 2.46 [2.31–2.61] 3.59 [3.39–3.8] 0.55 [0.53–0.56] 0.63 [0.61–0.64]
and biliary
diseases
BCI Bayesian Credible Interval, RR relative risk
diagnosis using two complementary terminologies29. We found gaps in medical evaluation, resulting in under- or overdiagnosis.
that more than half of the ICD-10 diagnoses examined had a This under- or overdiagnosis may result from a variety of causes,
different AIR in men and women, and this percentage was even and the interaction between under-, overdiagnosis, and sex is of
higher using the GBD categories. The age at first hospital diag- general interest, but not something we explored.
nosis was, on average, higher in women, across nearly all areas of We used two different terminologies to examine sex differ-
disease. We showed that population-level estimates of the relative ences. The ICD-10 terminology reflects the current clinical
risk, and even directionality, often were driven by a single sex. practice, and how hospital admissions have been coded since
Specifically, the jointly observed longitudinal patterns were most 1994 in Denmark. In a tradeoff between power and specificity, we
strongly driven by men, and the strength of directionality was worked with ICD-10 at the third level. The GBD categories
weaker in women, irrespective of the terminology used. There represent clinical entities, and sometimes follow different defini-
were many non-sex-specific diagnosis co-occurrences only found tions. For instance, in our analysis we would not have identified
in men or women; these discrepancies were tied to differences in the relationship between two of the underlying components of
the relative risk as well as the timespan between two diagnoses. COPD, emphysema and bronchitis, and osteoporosis had we only
Using the diagnosis trajectory approach, we illustrate how the used the GBD terminology. Nonetheless, the GBD categories also
sex-specific statistics can be used in the search for differences in pointed to important findings that could not be identified using
longitudinal patterns. In three case stories within respiratory only the ICD-10 terminology, such as alcoholic cardiomyopathy.
disorders, environmental disorders, and sarcoidosis we high- Some sex-specific co-occurrences may also be treatment pro-
lighted how the methods applied in this article provide insight voked. There is an increasing focus on sex-mediated side effects,
into gender-specific trends in diseases and disease progression. which may be due to physical, hormonal, or even genetic dif-
Taken together this is, to our knowledge, the most comprehensive ferences32,33. This area was not an aspect we could explore further
analysis of sex incongruities in a single population presented so either due to lack of full access to medication data. In the co-
far. occurrence analysis, we did not apply prior knowledge to assign
The study used a national patient registry, containing infor- the direction of association, i.e. whether diagnosis A was a risk
mation from all private and public hospital admissions in Den- factor of B or vice versa, but used advanced statistical models to
mark, including all age groups. The population of Denmark is infer the most likely order. Many conditions develop asympto-
reasonably homogenous (~11.1% immigrants and descendants in matically or with diffuse symptoms. Symptoms will often, but not
2015, of which 6.2% are from non-European countries)30. Thus, always, be identified prior to the underlying cause. As a con-
we expect that our observations are not confounded by race. Due sequence, some conditions are not necessarily discovered in the
to the nature of registry data, there are many latent factors for order they arise. However, we do not discern whether this relates
which we could not account. We have attempted to eliminate to different etiology, differences in presentation of symptoms,
confounding from age, admission type, changing diagnostic cri- genetics (e.g. the well-known fact that the Y-chromosome
teria, and seasonal influence. One of the largest limitations of the increases the risk for CVD in men34–36), differences in drug
study is the quality of data recording, and we cannot rule out that usage (e.g. higher rates of cytochrome P450 CYP3A substrate
some of the incongruities could be explained by systematic errors. metabolism in women32), or biases in the healthcare system (e.g.
Nonetheless, the registry data are used for hospital reimburse- frequency of contact). The main goal was to present an overall
ment, undergoing yearly compensation adjustments, and thus the view of sex differences, irrespective of mechanistic molecular
accuracy of most diagnoses is high31. We chose to only investigate causes, links to differences in environmental exposures, or biases
the first occurrence of a diagnosis. It is extremely difficult to in the healthcare system.
determine when a diagnosis is a recurrence, or just repeated due Menopause may also confound the results. This is not a con-
to the patient changing wards (or similar). Often, for nonacute dition that is recorded in the registry, but could be explored by
conditions, there are waiting lists at the hospitals. Waiting times selecting a fixed age. An earlier study found the average age to be
fluctuate over the 20-year period, due to political decisions on 49 years, but the standard deviation was approximately 1537.
budgets, prioritization of disease areas like cancer, and new Thus, selecting a fixed age could lead to a big bias, due to the large
technologies. Hence, we did not include recurrences because it spread. This is better explored in another resource where it is
could potentially introduce bias and spurious findings. Other explicitly recorded, e.g. the UK Biobank38. In cases with rare
limitations regarding true disease state may be due to systematic incidence or co-occurrence of diagnoses it can be difficult to
Intracranial injury
Fracture of shoulder and upper
arm Alcoholic liver disease Other disorders of fluid,
electrolyte and acid-base balance
Dental caries
Unspecified
jaundice
Open wound of head
Acute pancreatitis
Varicose veins of other sites
Fig. 5 Diagnosis trajectories involving injury or drug and alcohol abuse. A trajectory network combining 176 linear diagnosis trajectories related to alcohol
and substance abuse (ten directional pairs with extreme differences in relative risk). Edges represent the connection between the diagnoses with
directional co-occurrence. The orange and green edges between nodes indicate co-occurrences where the RR was elevated in women and men,
respectively. The RR of injuries followed by alcoholic liver disease is increased in women. Furthermore, women have a higher RR of complications following
esophageal varices, such as hepatic failure. RR relative risk
Urethral stricture
Fig. 6 Diagnosis trajectories related to cancer. A trajectory network combining 62 linear diagnosis trajectories related to cancer (the ten directional pairs
with extreme differences in relative risk). The trajectories illustrate disease routes that are related to cancers in the thyroid gland and urinary tract. The
progression pattern includes secondary neoplasms, renal complications, and sepsis. Color scale as in Fig. 5
obtain a proper estimate of the standard error (SE), leading to even clinical trials39,40. An argument against Bayesian statistics is
inflated intervals for incidence rates or relative risks. We have often that the choice of priors may introduce biases in the esti-
attempted to mitigate this by using a Bayesian Hierarchical Model mates. Conversely, here we have chosen informative priors that
(BHM). The BHM improves the estimate of the SE by pooling center the estimates at no effect, and pool the standard deviation.
information across groups; an approach also used in the GBD and Thus, instead of introducing an unwanted bias, we have actually
Unspecified
Osteoporosis without
Poisoning by nonopioid chronic bronchitis
pathological fracture
analgesics, antipyretics and Cardiac arrest
antirheumatics
Asthma
Simple and mucopurulent Other ill-defined and unspecified
chronic bronchitis causes of mortality
Acute bronchitis
Fig. 7 Diagnosis trajectories related to obstructive lung disease and osteoporosis. A trajectory network combining 112 linear diagnosis trajectories including
osteoporosis (M80, M81) and obstructive lung diseases (J40−J46). The orange edges indicate co-occurrences only present in women, and the green
edges indicate co-occurrences only present in men. The trajectories illustrate how obstructive lung diseases are found as a risk factor for osteoporosis in
men, but not women. Moreover, osteoporosis without fracture followed by osteoporosis with fracture was only found in women
made a more conservative estimate compared to traditional driven by men, and that the strength of directionality is weaker in
models, which often assume an uninformative prior41. Lastly, we women. The tendency to report sex-specific estimates is becoming
have disclosed all investigations we have performed in the Sup- increasingly standard practice, in particular due to the recognized
plementary Information and provide a rich set of aggregate data fact that sex and gender considerations are vital in precision
that can be used in future studies. medicine47,48.
We validated a number of the co-occurrences in the existing We found many directional pairs that were unique to one sex.
literature. The majority of the articles investigating the same In this regard, our data demonstrate that disease co-occurrences
conditions agreed with our findings. Nevertheless, this task is related to cancers, digestive disorders, and injuries were over-
challenging as no other study is as broad as the one we present. represented in men. This result indicates that men are more
Many studies do not investigate if there is a difference in sex- burdened by cancers, and complications, in the digestive system.
specific risks8. This omission included both meta-analysis, In a temporal context, we also noticed that men are diagnosed
cohort-, and case-control studies. Sex is an important factor in with digestive system diseases (ch11) prior to neoplasms (ch2).
epidemiological studies. In studies of single or few diagnoses with This points towards a disparity in life-style-related diseases.
different cohorts, as well as the GBD, it is well documented that Taken together, these findings suggest a bias in clinical practice,
there are sex-mediated differences in the incidence rates3,39. Our in which men with digestive system disorders are monitored
results derived directly from hospital admissions for single disease more closely for neoplasms, whereas women are not. Men suffer
incidence align well with previously reported differences, such as more co-occurring injures. In women, both respiratory and
cancer, musculoskeletal disorders, and autoimmune dis- musculoskeletal disorders were overrepresented in combination
eases15,42,43. We found that the age of first hospital diagnosis was, with symptoms and signs (chapter 18). One explanation for this
on average, nearly always higher in women. To our knowledge, could be that the prevalence of musculoskeletal disorders is
this has not been systematically studied before, and only reported higher in women, which leads to more unspecific symptoms, such
for few specific areas, such as cardiovascular disorders44. A as pain. A previous study found that women report
growing body of literature suggests that the reason for the delayed musculoskeletal-related pain more often, and that this could be
onset of cardiovascular disorders in women is due to the pro- caused by a musculoskeletal sex difference49. In contrast to earlier
tective effect from estrogen44,45. While the age of first hospital large studies pooling data from multiple cohorts, we have been
diagnosis should not be confused with the age of onset, there is able to compare the timespan between temporal co-occurrences.
growing evidence that the protective role of estrogen is more We identified cases in which the temporal pairs had a different
widespread than previously thought. For instance, estrogen has timespan in men and women. We note that in 72% of cases the
been suggested to be a neuroprotective factor, which is in diagnosis-free interval is longer for women than men. This
agreement with our findings concerning a later age of first hos- finding aligns well with our earlier finding that the age of first
pital diagnosis in women for nervous system disorders46. Sex can hospital diagnoses is nearly always greater in women, and clearly
also be a strong confounding factor when estimating diagnosis shows how this widespread effect even translates into a temporal
co-occurrence. To date no study has yet performed a systematic context.
investigation of sex-specific diagnosis co-occurrences. We show The diagnosis trajectory analysis showed an increased risk in
how population-wide estimates of co-occurrence can be driven by women between several injuries, substance abuse, and compli-
a single sex, even when using a matched comparison group to cations of substance abuse that we speculate could be indicators
negate other confounding factors. Furthermore, we demonstrated of a gender bias reflecting domestic violence and consequences
that the jointly observed longitudinal patterns are most strongly from drug abuse, in light of an earlier finding that found
substance abuse to be a risk factor for nonfatal injuries in by the fact that men and women develop different subtypes of
women50. The trajectory analysis also demonstrated a temporal IHD44.
relationship between nontoxic goiter, thyroid cancer, and sec- Taken together, our findings strongly suggest many disparities
ondary cancer in which men were at a higher risk. Women have a in a population with a uniform, one-payer-based healthcare,
higher incidence of thyroid cancer, and male sex is described as a again underscoring the need for better sex-stratified medicine.
risk factor for malignant thyroid nodules. Earlier studies have Generally, many of our findings align well with larger meta
found that the aggressive subtypes of thyroid cancer have a studies, such as the GBD. Our study adds the dimension of the
similar incidence in men and women, but that men often present temporal aspect between disorders. In doing so, provide guidance
at a more advanced stage51,52. This observation is an important in the design of future studies while also pointing to potential
finding from both epidemiological studies and this population- gaps in disease surveillance, diagnosis, and management. None-
wide analysis and demonstrates the necessity of investigating theless, a clear extension would be to perform this study in other
multistep temporal associations. Furthermore, the analysis of cohorts, such as the UK Biobank although it is not comparable in
obstructive lung disease and osteoporosis trajectories indicated size38. Including resources such as the UK Biobank or the
patterns of severe under diagnosis. First, obstructive lung diseases emerging FinnGen and AllofUs data sets would potentially make
were observed as risk factors for osteoporosis only in men. These it possible to identify genetic variants that could explain part of
results are in contrast to an earlier cross-sectional study, which the discrepancy in disease progression.
found that sex did not modulate the association between airflow
obstruction and osteoporosis53. However, the temporal relative Methods
risk may be more informative than the odds-ratio for the non- Study design and participants. This was a population-based registry study based
temporal co-occurrence. Moreover, obstructive lung diseases are on the Danish National Patient Registry (DNPR). The DNPR covered all public
underdiagnosed in women, a factor that can affect the estimates and private hospital admissions in Denmark during 1994–2015, 6,909,676 patients
in a cohort study. Secondly, there was no directionality observed (ICD-10 period only). The healthcare system in Denmark is universal, meaning
everyone living in Denmark has free access to care. Patients can be tracked through
between osteoporosis without fracture and osteoporosis with the healthcare system using the Central Person Registry (CPR) number, which is a
fracture in men, but an elevated relative risk in both directions. unique identifier assigned to every Danish citizen at birth or immigration (initiated
Contrary, women were observed to have this pattern. This sug- in 1968). Visits to the general practitioner (GP) and private specialist clinics were
gests that osteoporosis in men is not diagnosed prior to fracture, not included in the data set. Admissions included inpatient (patients admitted to
the hospital overnight), outpatient (patients not admitted to the hospital over-
and therefore not managed. This could be part of the reason why night), and emergency department contacts. Prior to 2002 there were both full-day
mortality is higher in men with osteoporotic fracture, compared inpatients and half-day inpatients. After 2002, the two groups were merged into
to women54. Lowered bone mineral density is a known adverse one. Hence, we merged full-day inpatient and half-day inpatient from before 2002
effect from corticosteroid therapy, a drug often used in the into one group, inpatient. Inpatient records cover the time from admission of a
patient to a hospital ward, until discharge to another ward or from the hospital. If a
treatment of asthma and COPD. Possibly the lack of a connection patient was discharged to another ward, the records were combined into one
for women could be due to the large difference in age of diagnosis record. Likewise, if the patient was re-admitted to the hospital the next day the
for asthma, and therefore treatment is started later. In the case of records were combined. The data also included open outpatient contacts. If a
COPD, corticosteroid therapy is only suggested for shorter patient has regular follow-ups at the hospital the contact may remain open inde-
finitely as an outpatient. Since 2000, the DNPR has been used for reimbursement
symptomatic periods. However, COPD is a substantially under- and the reimbursement rates are adjusted on an annual basis62. All referral diag-
diagnosed disease and two studies have estimated that 50–80% of noses were excluded Referral diagnoses are used when patients are referred to
COPD patients are undiagnosed55,56. Moreover, the COPD another ward or department for further investigation based on a suspicion of a
diagnosis is only confirmed by spirometry in 50% of the diag- disorder. The ICD-10 is structured hierarchically with four levels. We studied
nosed patients57,58. Hence, patients receiving a diagnosis of diagnosis codes at the third ICD-10 level. We excluded ICD-10 diagnoses coming
from chapters 20, 21, 22, as well as all codes specific to the Danish version of ICD-
COPD may be more symptomatically severe and would be 10. Codes specific for Denmark mainly describe length and weight at birth. We
expected to receive a more systemic steroid exposure. Recent data used the Chronic Condition Indicator to differentiate between acute and chronic
also suggest that moderate to severe emphysema itself is a risk ICD-10 codes (https://www.hcup-us.ahrq.gov/toolssoftware/chronic_icd10/
factor for osteoporosis59. Another equally valid explanation could chronic_icd10.jsp, last visited 13 July 2018). We performed a complementary
analysis using the GBD categories, retrieved from http://ghdx.healthdata.org/
also be that the COPD phenotype carries risk of osteoporosis due record/global-burden-disease-study-2016-gbd-2016-causes-death-and-nonfatal-
to COPD associated frailty, smoking effects on bone metabolism, causes-mapped-icd-codes (last accessed 12 June 2018). The corresponding analysis
and limitations in physical activity. In addition, there is an is described in detail in Supplementary Note 1.
interesting and emerging set of studies showing vitamin D
receptor polymorphisms in patients with COPD and osteo- Bayesian inference and model fitting. Posterior distributions are summarized as
porosis60. The relative impact of these factors would be greatest in a BCI. The BCI is the interval that spans the most credible values of the dis-
men, given the baseline higher (>4 times) level of osteoporosis in tribution, sometimes also referred to as the Highest Density Interval63. We defined
the range of the BCI in this work to be the interval that spans 95% of the posterior
women compared to men by age 50. distribution. Unless otherwise specified, the reported effect size is the median of the
Our case story regarding respiratory disorders also highlighted posterior distribution. We also defined a Region Of Equivalent Practice (ROPE) for
that some complications, such as bronchiectasis and emphysema, the quantities of interest. A ROPE is a small region of values considered to be
were different in men and women, a finding that may be relevant practically equivalent to a null value63. This is to ensure that the effect size of
to the clinical assessment and management. Lastly, we found 16 interest has a magnitude of clinical relevance, and is not just marginally different
from the null value. All Bayesian models were made using the No-U-Turn sampler,
cases where the directionality between two diagnoses was oppo- a Hamiltonian Monte Carlo (HMC) variant, implemented in Stan v. 2.17.0, an
site. Some of these point to conditions in which men are not open-source probabilistic programming language64,65. Unless otherwise specified,
diagnosed prior to serious complications, such as the case with we ran four HMC chains, with default settings, for a total of 4000 samples, 2000 of
Crohn’s disease and abscesses of anal and rectal regions. Other them for warm-up to adapt HMC-specific hyper-parameters. The number of
samples is significantly lower than what is usually drawn using e.g. Gibbs sampling.
examples included IHD and PT, and pancreatitis and gallbladder This is due to the nature of the NUTS-HMC algorithm, which converges faster65.
and biliary diseases. One study found that pancreatitis in men was We assessed convergence by inspecting the R-hat statistic, tree depth, and number
typically alcohol induced, while in women it was due to biliary of divergences66,67. The R-hat statistic describes the variation between chains. If all
problems, which could explain the reversed order of diagnosis61. chains have arrived at the exact same posterior distribution for the given para-
meter, the R-hat will be 1. The tree depth plot is a method for assessing pathology
We speculate that the observed difference between IHD and PT, in the HMC algorithm. If the tree depth goes to the maximal at every iteration past
in which IHD is a recognized risk factor, could possibly be due to warm-up this indicates a random-walk behavior, which can lead to biases in the
an under diagnosis of IHD in women. This is further complicated parameter estimates. A divergence happens when the model has numerical
problems (e.g. division by zero, under flowing, or over flowing), and may indicate a Age of first hospital diagnosis. We calculated the average age of diagnosis for a
problematic posterior or model that does not fit the data well. In this work, we given ICD-10 code by calculating the mean across all cases in the NPR separately
conclude that a model has converged if and only if, (1) all R-hat values are below for men and women. We identified differences using the Welch t test. P values were
<1.1, (2) the tree depth is not at the maximal in any of the chains past warm-up, (3) adjusted using the stringent Benjamini-Hochberg (BH) procedure. We report the
there are zero divergences. difference in means. We estimated the chapter-wise difference by calculating the
weighted mean.
Diagnosis incidence rates. We examined all diagnoses at the ICD-10 level 3 that
occurred in at least 100 patients during the 21-year period. The cutoff was set to Diagnosis co-occurrence. We examined all pairs of diagnosis that occurred in
avoid diagnoses used only very rarely or never. A number of diagnoses can only more than 100 individuals. The cutoff was set to ensure that the combination of
occur in one sex. For instance, hyperplasia of prostate can only occur in men. To two diagnoses is sufficiently prevalent to be of interest. The time resolution of the
identify sex-specific diagnoses we manually curated each diagnosis examined in NPR is one day, and any diagnoses given on the same day were not counted. Only
this work, and classified whether the diagnoses were sex specific or not (Supple- the first occurrence of a diagnosis was considered. The time of diagnosis was taken
mentary Data 8). A trained clinician oversaw and verified the curation. To estimate as the time the patient was discharged. If the patient had not yet been discharged,
the incidence, we fitted a hierarchical Bayesian Poisson model of the form shown in the date of the last diagnosis was used instead. ICD-10 has a dual coding system,
Eq. (1), the dagger−asterisk system. The asterisk represents the symptom or manifestation
of disease and the dagger indicates the etiology of the disease. We identified these
yi Poissonðexpðηi ÞÞ ð1Þ pairs and excluded them from subsequent analysis.
To negate the most common confounding factors, we sampled a matched
in which ηi is a linear combination of the strata for every diagnosis as shown in Eq. comparison group. For any given combination of diagnoses, A and B, we fixate A
(2), as the Exposure (Ex) and B as the Event (Ev) to estimate the time-resolved relative
risk, RR (A → B), and directionality, Pr (A → B). For every exposed patient, we
ηi ¼ βi;0 þ βi;Age xi;age þ βi;sex xi;sex þ logðoffseti Þ ð2Þ sampled five nonexposed cases matched to (i) be in the same age group, (ii) have a
hospital discharge from the same type of encounter (inpatient, outpatient,
in which the age is one of the 21 5-year interval groups defined in the European emergency department), (iii) be discharged at the same month of the same year,
Standard Population 2013 (Eurostat)24, the sex is a binary indicator, and the offset ±3 months. An earlier study found that the hospital encounter is a confounding
is the population at risk. To complete the model, we specify a set of priors on the factor in as much as 15% of the identified diagnosis co-occurrences from a study in
coefficients shown in Eqs. (3–5) the NPR20. Moreover, modifying disease definitions and diagnostic criteria may
affect both incidence and prevalence25. Previous studies using NPR found that
βi;0 N ð0; σ 0 Þ; ð3Þ changes in diagnostic criteria increased hospitalization rate for AMI, and increased
the prevalence and shifted the age of diagnosis for autism26,27. We negate this effect
by matching the encounter year. Lastly, by matching the encounter month we
βi;sex N ð0; σ sex Þ; ð4Þ diminish seasonal variation that may influence the incidence of, for instance,
infectious diseases.
The relative risk is not symmetrical, i.e. RR(A → B) ≠ RR(B → A), and thus we
βi;age N 0; σ age ð5Þ repeat the process of selecting matched controls by fixing B as the exposure, and A
as the event. This effectively doubles the number of combinations of diagnoses
examined.
in which βage represents a coefficient for each of the 21 age groups, with an HMC models are computationally expensive to fit. Consequently, prior to
individual prior, σage, on each coefficient. We defined the prior on the scales of the running the full hierarchical Bayesian model using Stan we applied a prefilter by
coefficients as shown in Eqs. (6–8), calculating the 95% CI of the relative risk using the formula provided by Morris
σ 0 Nþ ð0; 3Þ; ð6Þ and Gardner68. The relative risk is given in Eq. (13),
NA!B =ðNA!B þ NA Þ
RR ¼ ð13Þ
NB =ðNB þ N0 Þ
σ sex Nþ ð0; 0:5Þ; ð7Þ
in which the mean, ^η, was equal to the estimated coefficients, Eq. (10), We calculated CI separately for men, women, and the two sexes combined.
Only pairs of diagnoses in which either the lower bound of the CI RR(A → B) or RR
^ηi ¼ ^βi;0 þ β
^ ^
i;Age xi;age þ βi;sex xi;sex þ logðoffseti Þ: ð10Þ
(B → A) excluded 1.01 were included in the subsequent analysis, that is we only
studied diagnosis co-occurrences in which the exposure increased the risk of the
subsequent event by more than 1%. We note that we do not perform any correction
From the fitted coefficients, we calculated the age-adjusted IR (AIR) using the for multiple testing. Consequently, the number of false positives will be high.
European Standard Population 201324, as shown in Eq. (11) Additionally, in cases with a low number of patients, the estimate of the standard
P error will be inaccurate. However, in the following part we describe a BHM to
i pi Ni
AAIR ¼ P ð11Þ refine the estimate of the relative risk.
i Ni We refine the estimate of the temporal relative risk and directionality between
pairs of diagnoses by employing a hierarchical Bayesian model. For each exposure,
i, and the event observed together with this exposure, j, we describe the relationship
in which pi is the age-specific rate, and Ni is the population of age group i, using a Poisson model following Eq. (16),
according to the European Standard Population 2013. Rates were calculated for all,
men, and women, using the European Standard Population 2013, age-adjusted yij Poissonðexpðηij ÞÞ; ð16Þ
rates are per 100,000. If the relative difference is greater than 0.1, we conclude that
there is a difference in incidence rate. The relative difference is defined in Eq. (12),
where ηij is a linear combination shown in Eq. (17),
AAIRmen AAIRwomen
d¼ ; ð12Þ
ðAAIRmen þ AAIRwomen Þ=2 ηij ¼ βij;0 þ βij;Ex xij;Ex þ βij;Ev xij;Ev þ βij;ExEv xij;ExEv þ logðoffsetij Þ ð17Þ
where a positive number will indicate a higher AIR in men, and a negative in which xEx and xEv are indicator variables for the exposure and event,
number a lower AIR in women. respectively. xExEv is the interaction between the exposure and event. The offset is
the number of people within the group. We further estimated sex-specific relative Likewise, from the posterior distribution, we calculate an adjusted relative risk
risks by introducing a sex term and interaction terms between Ex, Ev, and Sex using the Cochran−Mantel−Haenszel method shown in Eq. (35),
shown in Eq. (18).
ðNm;B þNm;0 Þ ðN þN Þ
Nm;A!B þ Nf ;A!B f ;BN f ;0
ηij ¼ βij;0 þ βij;Sex xij;Sex þ βij;Ex xij;Ex þ βij;Ev xij;Ev þ βij;ExSex xij;ExSex RRðA ! BÞ ¼
Nm
ð35Þ
Nm;A!B þNm;A P
f
Nf ;A!B þNf ;A
: Nm;B
þβij;EvSex xij;EvSex þ βij;ExEv xij;ExEv þ βij;EvExSex xij;EvExSex þ logðoffsetij Þ Nm i þNf;B Nf Ni;B
ð18Þ and the male sex-specific relative risk is as specified in Eq. (36),
Nm;A!B
Nm;A!B þNm;A
To complete the model, we specify a set of priors for the regression coefficients, RRðA ! BÞm ¼ Nm;B ð36Þ
Eqs. (19)–(26) Nm;B þNm;0
βij;EvSex N ð0; σ EvSex Þ; ð25Þ ωWomen ¼ PrðA ! BÞjoint PrðA ! BÞWomen : ð38Þ
We tested if there was a difference in the variance of the distribution using the F-
βi;j;ExEvSex N ð0; σ ExEvSex Þ ð26Þ test. This requires that the distributions are normally distributed. We confirmed
this by visual inspection of the density plots (Supplementary Fig. 1). We report the
and weakly informative priors on the scale of each coefficient, Eqs. (27)–(32) ratio between variances (men compared to women) as the effect size, and the 95%
CI.
σ 0 Nþ ð0; 2Þ; ð27Þ Literature validation and comparison of co-occurrences with a higher risk in
men or women was performed by searching PubMed for articles mentioning either
diseases, or, a more specific relevant term. Articles matching were inspected for
σ B Nþ ð0; 2Þ; ð28Þ cohort sizes and to identify any sex-specific estimate of risk or mentions of sex as a
risk factor.
σ ExEv Nþ ð0; 2Þ; ð29Þ
Difference in time between diagnosis. The time between two diagnoses is
computed across all patients that have been diagnosed with both diagnoses. We
σ EvSex Nþ ð0; 2Þ; ð30Þ only look into the directional pairs, defined by an elevated relative risk and pre-
ferred direction. We notice that, due to the long follow-up, the distributions have a
heavy tail, and are thus not normally distributed. Therefore, we used the two-sided
σ ExSex Nþ ð0; 2Þ; ð31Þ Mann−Whitney U test. Only directional pairs found in both men and women are
investigated. Effect sizes are reported as the median difference in time, and the p
value is corrected for multiple testing using the BH method. A median difference in
σ ExEvSex Nþ ð0; 2Þ ð32Þ time less than zero indicates that the disease transition progress faster in women,
and likewise a median difference in time greater than zero indicate that the pro-
gression is faster in men.
in which N+ is the truncated normal distribution. We have removed confounding
from age, admission type, admission year, and admission month by selecting five
matched patients. Thus, we have not included these terms in the model, as the goal Diagnosis trajectories. We pieced together directional pairs of diagnosis to form
is to study the effects from sex. The prior values chosen for the interaction terms multistep trajectories18,19. For every pairwise co-occurrence, we iteratively added a
favors effects close to zero. Hence, by prior design we expect that only few of the diagnosis and counted the number of people following the trajectory in the
pairs investigated will occur together more than expected by chance. In addition, population. In this particular study, we only investigated trajectories followed by
the hierarchical structure imposes shrinkage on the coefficients and helps inform more than 100 people, with a minimum of four diagnoses. Using the disease
coefficient estimates across pairs where counts may be low63. trajectory framework, we studied two categories of trajectories. First, we investi-
Using the posterior distribution, we estimate the directionality and relative gated the directional pairs that had the biggest difference in relative risk between
risks. We simulated the number of patients who had been diagnosed in the order men and women. Second, we selected two diseases, obstructive lung diseases and
A → B and B → A, and calculated the probability of observing two diagnoses in a osteoporosis, which prior studies had found to be underdiagnosed in women and
specific direction using the formula specified in Eq. (33), men, respectively. The trajectories were visualized as networks, in which each node
represents a diagnosis and the connection between two nodes, the edge, represents
NA!B a directional link between two diagnoses.
PrðA ! BÞ ¼ : ð33Þ
NA!B þ NB!A
Reporting summary. Further information on experimental design is available in
the Nature Research Reporting Summary linked to this article.
The probability of Pr(A → B) is thus as specified in Eq. (34),
PrðB ! AÞ ¼ 1 PrðA ! BÞ: ð34Þ Data availability
The study was approved by the Danish Data Protection Agency (ref: 2015-54-0939
and SUND-2017-57) and Danish Health Authority (ref: FSEID-00001627 and
We define a ROPE in the interval (0.49, 0.51). If the BCI excludes these values, FSEID-00003092). Permission to access and analyze data can be obtained following
we conclude that the pair of diagnoses has a preferred statistical direction. This approval from Danish Data Protection Agency and the Danish Health Authority. A
probability can also be interpreted quantitatively. For instance, PrðA ! BÞ ¼ 0:8 reporting summary for this article is available as a Supplementary Information file.
would correspond to A being diagnosed before B in four out of five cases. Stan (v 2.17)64, Python (v2.7), and R (v.3.1.3) was used for statistical analysis. Due
to privacy concerns, the provided Supplementary Data only contain estimates for 26. Parner, E. T., Schendel, D. E. & Thorsen, P. Autism prevalence trends over
diagnosis and co-occurrences when it has been assigned to at least five men and time in Denmark: changes in prevalence and age at diagnosis. Arch. Pediatr.
women. Adolesc. Med. 162, 1150–1156 (2008).
27. Abildstrom, S. Z., Rasmussen, S. & Madsen, M. Changes in hospitalization
rate and mortality after acute myocardial infarction in Denmark after
Received: 12 March 2018 Accepted: 8 January 2019 diagnostic criteria and methods changed. Eur. Heart J. 26, 990–995
(2005).
28. Jørgensen, N. R. et al. The prevalence of osteoporosis in patients with chronic
obstructive pulmonary disease: a cross sectional study. Respir. Med. 101,
177–185 (2007).
29. Barber, R. M. et al. Healthcare Access and Quality Index based on mortality
References from causes amenable to personal health care in 195 countries and territories,
1. Baggio, G., Corsini, A., Floreani, A., Giannini, S. & Zagonel, V. Gender 1990–2015: a novel analysis from the Global Burden of Disease Study 2015.
medicine: a task for the third millennium. Clin. Chem. Lab. Med. 51, 713–727 Lancet 390, 231–266 (2017).
(2013). 30. Denmark in Figures. Denmark in Figures. http://www.dst.dk/en/Statistik/
2. Regitz-Zagrosek, V. Sex and gender differences in health. EMBO Rep. 13, Publikationer/VisPub?cid=19006 (accessed 21 July 2017) (2015).
596–603 (2012). 31. Thygesen, S. K., Christiansen, C. F., Christensen, S., Lash, T. L. & Sørensen, H.
3. Franconi, F., Sanna, M., Straface, E., Chessa, R. & Rosano, G. Sex and Gender T. The predictive value of ICD-10 diagnostic coding used to assess Charlson
Aspects in Clinical Medicine. Pathophysiology (Springer, New York, 2012). comorbidity index conditions in the population-based Danish National
4. World Health Organization. WHO gender policy: integrating gender Registry of Patients. Bmc Med. Res. Methodol. 11, 83 (2011).
perspectives in the work of WHO. http://origin.who.int/gender-equity-rights/ 32. Nicolson, T. J., Mellor, H. R. & Roberts, R. R. A. Gender differences in drug
knowledge/a78322/en/ (Accessed 22 February 2018). (2002). toxicity. Trends Pharmacol. Sci. 31, 108–114 (2010).
5. Siddiqui, R. A. et al. X chromosomal variation is associated with slow 33. Spoletini, I., Vitale, C., Malorni, W. & Rosano, G. M. C. in Sex and Gender
progression to AIDS in HIV-1-infected women. Am. J. Hum. Genet. 85, Differences in Pharmacology (ed. Regitz-Zagrosek, V.) 91–105 (Springer,
228–239 (2009). Berlin, Heidelberg, 2013). https://doi.org/10.1007/978-3-642-30726-3_5
6. Liu, L. Y., Schaub, M. A., Sirota, M. & Butte, A. J. Sex differences in disease 34. Charchar, F. J. et al. Association of the human Y chromosome with cholesterol
risk from reported genome-wide association study findings. Hum. Genet. 131, levels in the general population. Arterioscler. Thromb. Vasc. Biol. 24, 308–312
353–364 (2012). (2004).
7. Cereda, E. et al. Dementia in Parkinson’s disease: is male gender a risk factor? 35. Charchar, F. J. et al. Inheritance of coronary artery disease in men: an analysis
Park. Relat. Disord. 26, 67–72 (2016). of the role of the y chromosome. Lancet 379, 915–922 (2012).
8. Ortona, E., Delunardo, F., Baggio, G. & Malorni, W. A sex and gender 36. Charchar, F. J., Tomaszewski, M., Strahorn, P., Champagne, B. & Dominiczak,
perspective in medicine: a new mandatory challenge for human health. Ann. A. F. Y is there a risk to being male? Trends Endocrinol. Metab. 14, 163–168
Ist. Super. Sanita 52, 146–148 (2016). (2003).
9. Caenazzo, L., Tozzo, P. & Baggio, G. Ethics in women’s health: a pathway to 37. Boldsen, J. L. & Jeune, B. Distribution of age at menopause in two danish
gender equity. Adv. Med. Ethics 2, 5 (2015). samples. Hum. Biol. 62, 291–300 (1990).
10. Zakiniaeiz, Y., Cosgrove, K. P., Potenza, M. N. & Mazure, C. M. Balance of the 38. Sudlow, C. et al. UK Biobank: an open access resource for identifying the
sexes: addressing sex differences in preclinical research. Yale J. Biol. Med. 89, causes of a wide range of complex diseases of middle and old age. PLoS Med.
255–259 (2016). 12, e1001779 (2015).
11. Shader, R. I. More on women’s health, gender medicine, and the complexities 39. Vos, T. et al. Global, regional, and national incidence, prevalence, and years
of personalized medicine. Clin. Ther. 38, 233–234 (2016). lived with disability for 310 diseases and injuries, 1990–2015: a systematic
12. Mcgregor, A. J. The impact sex-differences research can have on women’s analysis for the Global Burden of Disease Study 2015. Lancet 388, 1545–1602
health. Clin. Ther. 38, 1–2 (2015). (2016).
13. Mehta, L. S. et al. Acute myocardial infarction in women: a scientific 40. Quintana, M., Viele, K. & Lewis, R. J. Bayesian analysis: using prior
statement from the American Heart Association. Circulation 133, 916–947 information to interpret the results of clinical trials. JAMA 318, 1605–1606
(2016). (2017).
14. Regitz-Zagrosek, V. Therapeutic implications of the gender-specific aspects of 41. Greenland, S. Bayesian perspectives for epidemiological research: I.
cardiovascular disease. Nat. Rev. Drug. Discov. 5, 425–438 (2006). Foundations and basic methods. Int. J. Epidemiol. 35, 765–775 (2006).
15. Eaton, W. W., Rose, N. R., Kalaydjian, A., Pedersen, M. G. & Mortensen, P. B. 42. Fitzmaurice, C. et al. Global, regional, and national cancer incidence,
Epidemiology of autoimmune diseases in Denmark. J. Autoimmun. 29, 1–9 mortality, years of life lost, years lived with disability, and disability-adjusted
(2007). life-years for 32 cancer groups, 1990 to 2015. JAMA Oncol. 3, 524
16. Willson, T., Nelson, S. D., Newbold, J., Nelson, R. E. & LaFleur, J. The clinical (2017).
epidemiology of male osteoporosis: a review of the recent literature. Clin. 43. Smith, E. et al. The global burden of other musculoskeletal disorders: estimates
Epidemiol. 7, 65–76 (2015). from the Global Burden of Disease 2010 study. Ann. Rheum. Dis. 73,
17. Ancochea, J. et al. Infradiagnóstico de la enfermedad pulmonar obstructiva 1462–1469 (2014).
crónica en mujeres: cuantificación del problema, determinantes y propuestas 44. Regitz-Zagrosek, V. & Kararigas, G. Mechanistic pathways of sex differences
de acción. Arch. Bronconeumol. 49, 223–229 (2013). in cardiovascular disease. Physiol. Rev. 97, 1–37 (2016).
18. Beck, M. K., Westergaard, D., Jensen, A. B., Groop, L. & Brunak, S. Temporal 45. Regitz-Zagrosek, V. in Sex and Gender Aspects in Clinical Medicine (eds
order of disease pairs affects subsequent disease trajectories: the case of Oertelt-Prigione, S. & Regitz-Zagrosek, V.) 17–44 (Springer-Verlag London,
diabetes and sleep apnea. Biocomput 2017 22, 380–389 (2017). 2012). https://doi.org/10.1007/978-0-85729-832-4
19. Beck, M. K. et al. Diagnosis trajectories of prior multi-morbidity predict sepsis 46. Arevalo, M.-A., Azcoitia, I. & Garcia-Segura, L. M. The neuroprotective
mortality. Sci. Rep. 6, 36624 (2016). actions of oestradiol and oestrogen receptors. Nat. Rev. Neurosci. 16, 17–29
20. Jensen, A. B. et al. Temporal disease trajectories condensed from population- (2014).
wide registry data covering 6.2 million patients. Nat. Commun. 5, 4022 47. Legato, M. J., Johnson, P. A. & Manson, J. E. Consideration of sex differences
(2014). in medicine to improve health care and patient outcomes. JAMA 316, 1865
21. Bagley, S. C. & Altman, R. B. Computing disease incidence, prevalence and (2016).
comorbidity from electronic medical records. J. Biomed. Inform. 63, 108–111 48. Schiebinger, L., Leopold, S. S. & Miller, V. M. Editorial policies for sex and
(2016). gender analysis. Lancet 388, 2841–2842 (2016).
22. Grimes, D. A. & Schulz, K. F. Bias and causal associations in observational 49. Rollman, G. B. & Lautenbacher, S. Sex differences in musculoskeletal pain.
research. Lancet 359, 248–252 (2002). Clin. J. Pain 17, 20–24 (2001).
23. Hidalgo, C. A., Blumm, N., Barabási, A. L. & Christakis, N. A. A Dynamic 50. Kyriacou, D. N. et al. Risk factors for injury to women from domestic violence.
network approach for the study of human phenotypes. PLoS Comput. Biol. 5, N. Engl. J. Med. 341, 1892–1898 (1999).
e1000353 (2009). 51. Lawrence, W. & Kaplan, B. J. Diagnosis and management of patients with
24. Eurostat Task force. Revision of the European Standard Population. http://ec. thyroid nodules. J. Surg. Oncol. 80, 157–170 (2002).
europa.eu/eurostat/documents/3859598/5926869/KS-RA-13-028-EN.PDF/ 52. Rahbari, R., Zhang, L. & Kebebew, E. Thyroid cancer gender disparity. Future
e713fa79-1add-44e8-b23d-5e8fa09b3f8f (accessed 29 November 2017) Oncol. 6, 1771–1779 (2010).
(2013). 53. Sin, D. D., Man, J. P. & Man, S. F. P. F. P. The risk of osteoporosis in
25. Doust, J. et al. Guidance for modifying the definition of diseases. JAMA Intern. Caucasian men and women with obstructive airways disease. Am. J. Med. 114,
Med. 177, 1020 (2017). 10–14 (2003).
54. Center, J. R., Nguyen, T. V., Schneider, D., Sambrook, P. N. & Eisman, J. A. Acknowledgements
Mortality after all major types of osteoporotic fracture in men and women: an We would like to acknowledge funding from the Novo Nordisk Foundation (grant
observational study. Lancet 353, 878–882 (1999). agreements NNF14CC0001 and NNF17OC0027594).
55. Çolak, Y., Afzal, S., Nordestgaard, B. G., Vestbo, J. & Lange, P. Prognosis of
asymptomatic and symptomatic, undiagnosed COPD in the general
population in Denmark: a prospective cohort study. Lancet Respir. Med. 5,
Author contributions
D.W. and S.B. conceived the study. S.B. obtained the funding. D.W. and S.B. performed
426–434 (2017).
the literature search, figures, study design, and data analysis. D.W., F.K.H.S., P.M., P.B.,
56. Martinez, C. H. et al. Undiagnosed obstructive lung disease in the United
and S.B. contributed to data interpretation. D.W. and S.B. wrote the initial draft, and D.
States. Associated factors and long-term mortality. Ann. Am. Thorac. Soc. 12,
W., F.K.H.S., P.M., P.B. and S.B. contributed to the final article.
1788–1795 (2015).
57. Arne, M. et al. How often is diagnosis of COPD confirmed with spirometry?
Respir. Med. 104, 550–556 (2010). Additional information
58. Koefoed, M. M., Christensen, RdePont, Søndergaard, J. & Jarbøl, D. E. Lack of Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467-
spirometry use in Danish patients initiating medication targeting obstructive 019-08475-9.
lung disease. Respir. Med. 106, 1743–1748 (2012).
59. Bon, J. et al. Radiographic emphysema, circulating bone biomarkers, and Competing interests: The authors declare no competing interests.
progressive bone mineral density loss in smokers. Ann. Am. Thorac. Soc. 15,
615–621 (2018). Reprints and permission information is available online at http://npg.nature.com/
60. Kim, S. W. et al. Association between vitamin D receptor polymorphisms and reprintsandpermissions/
osteoporosis in patients with COPD. Int. J. Chron. Obstruct. Pulmon. Dis. 10,
1809 (2015). Journal peer review information: Nature Communications thanks the anonymous
61. Lankisch, P. G., Assmus, C., Lehnick, D., Maisonneuve, P. & Lowenfels, A. B. reviewers for their contribution to the peer review of this work. Peer reviewer reports are
Acute pancreatitis: does gender matter? Dig. Dis. Sci. 46, 2470–2474 available.
(2001).
62. Ankjær-Jensen, A., Rosling, P. & Bilde, L. Variable prospective financing in Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in
the Danish hospital sector and the development of a Danish case-mix system. published maps and institutional affiliations.
Health Care Manag. Sci. 9, 259–268 (2006).
63. Kruschke, J. K. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, Open Access This article is licensed under a Creative Commons
and Stan, Second Edition, https://doi.org/10.1016/C2012-0-00477-2 (2014). Attribution 4.0 International License, which permits use, sharing,
64. Carpenter, B. et al. Stan: a probabilistic programming language.J. Stat. Softw. adaptation, distribution and reproduction in any medium or format, as long as you give
76, 1–32 (2017). appropriate credit to the original author(s) and the source, provide a link to the Creative
65. Hoffman, M. D. & Gelman, A. The No-U-Turn sampler: adaptively setting Commons license, and indicate if changes were made. The images or other third party
path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 30 (2014). material in this article are included in the article’s Creative Commons license, unless
66. Betancourt, M. Diagnosing biased inference with divergences. http://mc-stan. indicated otherwise in a credit line to the material. If material is not included in the
org/users/documentation/case-studies/divergences_and_bias.html (accessed article’s Creative Commons license and your intended use is not permitted by statutory
17 April 2017). regulation or exceeds the permitted use, you will need to obtain permission directly from
67. Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple the copyright holder. To view a copy of this license, visit http://creativecommons.org/
sequences. Stat. Sci. 7, 457–472 (1992).
licenses/by/4.0/.
68. Morris, J. A. & Gardner, M. J. Calculating confidence intervals for relative
risks (odds ratios) and standardised ratios and rates. Br. Med. J. (Clin. Res.
Ed.). 296, 1313–1316 (1988). © The Author(s) 2019