2018 Article 3585
2018 Article 3585
2018 Article 3585
Abstract
Background: Transmission patterns in high tuberculosis incidence areas in England are poorly understood but
need elucidating to focus contact tracing. We study transmission within and between age, ethnic and immigrant groups
using molecular data from the high incidence West Midlands region.
Methods: Isolates from culture-confirmed tuberculosis cases during 2007–2011 were typed using 24-locus Mycobacterial
Interspersed Repetitive Unit-Variable Number Tandem Repeats (MIRU-VNTR). We estimated the proportion of
disease attributable to recent transmission, calculated the proportion of isolates matching those from the two
preceding years (“retrospectively clustered”), and identified risk factors for retrospective clustering using multivariate
analyses. We calculated the ratio (RCR) between the observed and expected proportion clustered retrospectively within
or between age, ethnic and immigrant groups.
Results: Of the 2159 available genotypes (79% of culture-confirmed cases), 34% were attributed to recent transmission.
The percentage retrospectively clustered decreased from 50 to 24% for 0–14 and ≥ 65 year olds respectively (p = 0.01)
and was significantly lower for immigrants than the UK-born. Higher than expected clustering occurred within 15–24
year olds (RCR: 1.4 (95% CI: 1.1–1.8)), several ethnic groups, and between UK-born or long-term immigrants with the
UK-born (RCR: 1.8 (95% CI: 1.1–2.4) and 1.6 (95% CI: 1.2–1.9) respectively).
Conclusions: This study is the first to consider “who clusters with whom” in a high incidence area in England, laying
the foundation for future whole-genome sequencing work. The higher than expected clustering seen here suggests
that preferential mixing between some age, ethnic and immigrant groups occurs; prioritising contact tracing to groups
with which cases are most likely to cluster retrospectively could improve TB control.
Keywords: Tuberculosis, West midlands, England, MIRU-VNTR, Clustering, Transmission, Contact patterns
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 2 of 14
but could also result from a common strain-type circu- Data collection
lating in England or elsewhere. Molecular studies in Data on notified cases are held in the national En-
England to date, based on MIRU-VNTR, have typically hanced TB Surveillance (ETS) database, which contains
considered short time periods (2010–2012) [2, 3], the patient-level demographic data (age, sex, world region
amount of household transmission and risk factors for of birth, ethnic group, and time from entry to the UK
clustering, but have not yet studied the characteristics and tuberculosis diagnosis for foreign-born individuals),
of cases with whom different cases cluster. The latter clinical details (including disease site and notification
depends largely on transmission patterns, and is af- year), behavioural risk factors (history of/current prob-
fected by other factors, including disease susceptibility. lem drug or alcohol use and history of/current home-
A study in Oxfordshire [4] considered a 5 year period lessness or time spent in prison); laboratory data
(2007–2012) using whole-genome sequencing (WGS), (culture-positivity and drug sensitivity). Clinical speci-
which has a higher resolution than MIRU-VNTR [5]. mens and referred cultures from suspected tuberculosis
However, this study considered a low tuberculosis inci- cases in the West Midlands were routinely sent to the
dence area (notification rate of 8/100000/year in 2016) Regional Centres for Mycobacteriology, Birmingham,
and WGS has not yet been used in England to study M for culturing, identification, strain-typing, and drug sus-
tuberculosis transmission in high incidence areas. ceptibility testing using standard methods [9]. Strain
Using 24-locusMIRU-VNTRstrain-typing data for types and other laboratory data were matched to
2007–2011 from the West Midlands, a high incidence patient-level ETS data [10]. Duplicate notifications and
ethnically diverse area in England (notification rates of specimens from the same patient occurring within 12
18 and 12/100,000/year in 2011 and 2015 respectively), months of initial notification or specimen-collection
we combine risk factor analyses with analyses of “who is were collated. TB episodes more than 12 months apart
clustered with whom” to get insight into transmission were considered separate notifications.
patterns by age, ethnic and immigrant group and discuss
some implications for contact tracing. Data and risk factor analysis
We estimated the proportion of cases during 2007–2011
Methods attributable to recent transmission using the “n-1” method
Study population [11], implicitly assuming that one source case initiates each
The study population included all culture-confirmed tu- cluster, and compared the estimate against the proportion
berculosis cases from the West Midlands region, notified of cases notified during 2009–2011 that were clustered
during 1st January 2007-31st December 2011, with an retrospectively. In sensitivity analyses we compared esti-
eligible 15 or 24-locusMIRU-VNTRstrain-type (see mates of the proportion attributed to recent transmission
below). The region numbers 5.6 million residents [6], for the “n-1” method using different time windows (2007–
including several cities with > 500,000 residents (e.g. 9, 2007–2010 and 2007–2011) and compared that against
Birmingham, Coventry, and Wolverhampton). the proportion retrospectively clustered with other cases
during the preceding 2 years for the same time window.
Molecular data and clustering definitions The proportion retrospectively clustered was also
During 2007–2009, culture-positive isolates were rou- calculated for the demographic characteristics, clinical
tinely typed with a set of 15 MIRU-VNTR loci [7]. details, behaviour risk factors described above and
From 2010, nine additional loci were typed [8] using drug sensitivity. We conducted a univariate analysis of
the internationally-recommended set of 24 MIRU- factors associated with retrospective clustering and
VNTR loci. To extend the dataset of 24-locus profiles, report maximum likelihood estimates of odds ratios
whilst conserving laboratory resources, strains isolated (OR) with Wald tests with 95% confidence limits.
during 2007–2009 which clustered in a preliminary Significance was evaluated using p-values from the
analysis (isolates matching identically on at least 14 likelihood ratio chi-square test (LRT), with p < 0.05
out of 15 loci) using all isolates from 2007 to 11 were considered significant.
typed with the additional nine loci. Isolates were then Multivariate logistic regression models were also con-
included in the present study if: 1) they had a unique structed, including the age group, sex and other variables
14 or 15 locus MIRU-VNTR profile (unclustered on significantly associated with clustering in the univariate
the preliminary analysis) or 2) their 24-locus MIRU- analysis. Either the region of birth or ethnicity were in-
VNTR profile had at least 23 loci typed. Cases notified cluded, with region of birth preferred if both were signifi-
during 1st January 2009-31st December 2011 whose cant. To avoid reducing models to just those foreign-born,
isolate matched identically on 24-locus typing with time since entry in the UK was excluded in multivariate
that from a case notified up to 2 years previously were models, as were behavioural risk factors, which were
defined to be “clustered respectively”. collected for only some cases. For factors included in
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 3 of 14
multivariate models, adjusted ORs and their 95% confi- arrival in the UK. In sensitivity analyses, the proportion
dence limits were reported, with significance evaluated retrospectively clustered and the retrospective clustering
using p-values from the LRT. For consistency with other ratio were calculated using time windows of 3 and 4
risk factor studies of clustering [2, 3] cases clustered retro- years to assess retrospective clustering. In these calcula-
spectively just with extrapulmonary cases were included. tions, cases who were notified during the periods 2007–
However, they were excluded in subsequent analyses. 9 and 2007–10 respectively were not eligible to be retro-
spectively clustered.
Analyses of who’s clustered with whom
To get insight into possible age-specific sources of infec- Software
tion, we calculated the proportion of cases notified dur- Risk factor analyses were conducted using Stata/SE 13.1
ing 2009–2011 in each age group (0–4, 5–14, 15–24, (StataCorp LP); other analyses were conducted using a
25–34, 35–44, 45–54, 55–64, 65–74 and ≥ 75 years), that specially-written C program with published routines [14].
were clustered retrospectively with pulmonary cases in
given age groups. For cases aged 15–24 years, for ex- Results
ample, the proportion retrospectively clustered with pul- Study population and descriptive analysis
monary cases aged j was given by: During 1st January 2007-31st December 2011, 4845
clinical tuberculosis cases were notified in the West
C 15−24; j R15−24
P9 Midlands region. 2749/4845 (56.7%) were culture-posi-
i¼1 C 15−24;i
N 15−24 tive, and 2543/2749 (92.5%) isolates were typed with at
least 15 loci (Fig. 1). The cases with and without iso-
where C15 − 24, j is the number of pulmonary cases aged j
lates typed had similar demographic characteristics
with whom cases aged 15–24 years notified during 2009–
(Table 1). Of those typed, 2423/2543 (95.3%) were eligible
11 were clustered retrospectively, R15 − 24 is the number of
for preliminary cluster analyses (Fig. 1). These identified
cases aged 15–24 years during 2009–11 who were clus-
691 cases who were not clustered and 1732 clustered
tered retrospectively with pulmonary cases of any group,
cases, of which 1468 had at least 23 loci typed, resulting in
and N15 − 24 is the total number of cases aged 15–24 years
2159 (=691 + 1732) cases eligible for risk factor analysis
notified during 2009–11.
for clustering using 24-locus profiles.
Adapting published methods [12], we calculated the
retrospective clustering ratio (RCR), defined as the ratio
Risk factors for retrospective clustering
between the proportion of retrospectively clustered cases
Of the 2159 cases analysed, 959 isolates (44%, 95% CI: 42–
in each age group that were clustered with pulmonary
46%) shared identical genotypes during 2007–11, compris-
cases aged j, and that expected, according to proportionate
ing 225 clusters, with 119 including two cases and 77, 16
mixing. For this assumption, the probability of retrospect-
and 9 clusters with 3–5, 6–9 and 11–49 cases respectively.
ive clustering with given age groups depends only on how
Only one cluster, with 102 cases, had > 50 cases.
many pulmonary cases in those age groups were notified
Of cases notified during 2009–2011, 452/1329 (3%,
up to 2 years before the given case. Considering cases aged
95% CI: 31–37%) were clustered retrospectively, which
15–24 years, for example, the ratio is given by:
was similar to the percentage of cases during 2007–
C 15−24; j T 15−24; j 2011 attributed to recent transmission using the “n-1”
P9 = P9 method ((959–225)/2159 = 734/2159 or 34% (95% CI:
C
i¼1 15−24;i i¼1 T 15−24;i
32–36%)). Most of the retrospective clustering oc-
where T15 − 24, j is the total number of pulmonary cases curred with pulmonary cases (Fig. 2). The percentage
aged j notified during the 2 years before the N15 − 24 retrospectively clustered was relatively insensitive to
cases aged 15–24 years who were notified during 2009– the study period, whilst the percentage of cases attrib-
11. Values for the ratio exceeding and below 1 suggest uted to recent transmission decreased as the duration
that there is more and less clustering respectively than of the study period decreased, to 32% (95% CI: 30–34)
expected between cases in given age groups. Confidence and 30% (95% CI: 28–33) considering the period
intervals were constructed through bootstrapping, using 2007–10 and 2007–9 respectively (Table 3).
10,000 bootstrap-derived datasets, generated by sam- The percentage clustered retrospectively was similar
pling clusters with replacement based on Borgdorff et al. for males and females (Table 2), decreasing with increas-
[13]. Clusters appearing multiple times in a bootstrap ing age from 50% for 0–14 year olds to 24% for those
dataset were treated as independent. aged at least 65 years (OR: 1.00 vs 0.3 (95% CI: 0.1–0.7),
The proportion retrospectively clustered and the RCR p < 0.001). A high percentage (53%) of UK-born cases
were analysed similarly considering different ethnic were clustered retrospectively, compared to those born
groups, the UK-born and immigrants by time since abroad (e.g. 26% of those born in South East Asia, OR:
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 4 of 14
Table 1 Characteristics of all 4845 cases notified in the West Midlands (2007–2011) and the study population
All cases Cases with genotype data Cases without genotype data
Number % Number % Number %
Year notified
2007 938 19.4 524 20.6 414 18
2008 1015 21 502 19.7 513 22.3
2009 1009 20.8 536 21.1 473 20.5
2010 872 18 494 19.4 378 16.4
2011 1011 20.9 487 19.2 524 22.8
Sex
Male 2638 54.5 1414 56 1224 53
Female 2205 45.5 1127 44 1078 47
Age group (years)
0–14 286 5.9 62 2.4 224 9.7
15–44 2769 57.2 1617 63.6 1152 50
45–64 966 19.9 470 18.5 496 21.5
65 and over 824 17 394 15.5 430 18.7
Region of birth
UK 1555 34.8 754 31.7 801 38.3
Europe 101 2.3 57 2.4 44 2.1
East Mediterranean 45 1 27 1.1 18 0.9
Africa 700 15.7 417 17.5 283 13.5
Americas 57 1.3 30 1.3 27 1.3
South Asia 1912 42.8 1033 43.4 879 42
East/Southeast Asia 103 2.3 64 2.7 39 1.9
Ethnicity
White 880 18.8 412 16.8 468 21.1
Black-Caribbean 173 3.7 96 3.9 77 3.5
Black-African 711 15.2 406 16.5 305 13.7
Black-Other 18 0.4 12 0.5 6 0.3
South Asian 2602 55.6 1364 55.5 1238 55.7
Chinese 47 1 21 0.9 26 1.2
Mixed/Other 248 5.3 145 5.9 103 4.6
Years since entry to tuberculosis diagnosis
0–1 435 16.4 262 17.6 173 14.8
2–4 573 21.5 357 24 216 18.4
5–9 606 22.8 349 23.4 257 21.9
10 and over 1046 39.3 521 35 525 44.8
Disease site
Pulmonary, with or without extra-pulmonary 2632 55.1 1635 64.6 997 44.4
Extra-pulmonary only 2141 44.9 895 35.4 1246 55.6
History of or current problem drug use
No 2181 96.8 1126 96.1 1055 97.6
Yes 72 3.2 46 3.9 26 2.4
History of or current problem alcohol use
No 2138 97.3 1100 96.6 1038 98.1
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 6 of 14
Table 1 Characteristics of all 4845 cases notified in the West Midlands (2007–2011) and the study population (Continued)
All cases Cases with genotype data Cases without genotype data
Number % Number % Number %
Yes 59 2.7 39 3.4 20 1.9
History of or current homelessness
No 2201 98.1 1144 97.0 1057 99.2
Yes 43 1.9 35 3.0 8 0.8
History of or currently in prison
No 2084 97.1 1079 96 1005 98.3
Yes 62 2.9 45 4.0 17 1.7
a b
100 100
80 80
60 60
40 40
Percentage retrospectively cl ustered
20 20
0 0
White
Black-Caribbean
Black-African
Pakistani
Indian
Black-other
Mixed-other
Bangladeshi
Chinese
Age group (years)
Ethnic group
c
100
80
Type of cases with whom
retrospective clustering occurs:
60
Extrapulmonary only
20
0
Unknown birthplace
0-1 yrs
2-4 yrs
5-9 yrs
UK-born
Table 2 Demographic features and risk factors for clustering using 24-locus typing for cases notified in the West Midlands, by the
retrospective method of clustering
All cases, 09–11 Clustered cases
N Col % N % OR (95% CI) P aOR (95% CI) p
Sex
Male 744 56 267 35.9 1 1
Female 585 44 185 31.6 0.8 (0.7,1) 0.1 0.8 (0.6,1) 0.03
Total 1329 100 452 34
Age group (years)
0–14 30 2.3 15 50 1 1
15–44 830 62.5 313 37.7 0.6 (0.3,1.3) 0.18 0.8 (0.4,1.7) 0.51
45–64 263 19.8 75 28.5 0.4 (0.2,0.9) 0.02 0.4 (0.2,0.9) 0.04
65 and over 206 15.5 49 23.8 0.3 (0.1,0.7) < 0.01 0.3 (0.1,0.7) 0.01
Total 1329 100 452 34
Birthplace
UK 401 31.5 214 53.4 1 1
Europe 37 2.9 10 27 0.3 (0.2,0.7) < 0.01 0.3 (0.1,0.7) < 0.01
East Mediterranean 16 1.3 4 25 0.3 (0.1,0.9) 0.04 0.2 (0.1,0.8) 0.02
Africa 221 17.3 54 24.4 0.3 (0.2,0.4) < 0.01 0.3 (0.2,0.4) < 0.01
Americas 14 1.1 5 35.7 0.5 (0.2,1.5) 0.2 0.7 (0.2,2.3) 0.59
South Asia 554 43.5 145 26.2 0.3 (0.2,0.4) < 0.01 0.4 (0.3,0.5) < 0.01
East/Southeast Asia 32 2.5 5 15.6 0.2 (0.1,0.4) < 0.01 0.2 (0.1,0.4) < 0.01
Total 1275 100 437 34.3
Ethnicity
White 220 17.2 88 40 1
Black-Caribbean 37 2.9 20 54.1 1.8 (0.9,3.6) 0.11 – –
Black-African 213 16.7 58 27.2 0.6 (0.4,0.8) < 0.01 – –
Black-Other 8 0.6 5 62.5 2.5 (0.6,10.7) 0.22 – –
South Asian 703 55.1 241 34.3 0.8 (0.6,1.1) 0.12 – –
Chinese 10 0.8 1 10 0.2 (0,1.3) 0.09 – –
Mixed/Other 86 6.7 28 32.6 0.7 (0.4,1.2) 0.23 – –
Total 1277 100 441 34.5
Time since entry to UK to tuberculosis diagnosis (years)*
0–1 147 18 32 21.8 1
2–4 176 21.5 39 22.2 1 (0.6,1.7) 0.93 – –
5–9 206 25.2 53 25.7 1.2 (0.8,2.1) 0.39 – –
10 and over 288 35.3 83 28.8 1.5 (0.9,2.3) 0.12 – –
Total 817 100 207 25.3
Disease site
Pulmonary 889 67 337 37.9 1 1
Extra-pulmonary 438 33 115 26.3 0.6 (0.5,0.8) < 0.01 0.6 (0.5,0.8) < 0.01
Total 1327 100 452 34.1
Drug sensitivity
Resistant to at least one drug 74 5.6 11 14.9 1 1
Sensitive 1244 94.4 440 35.4 3.1 (1.6,6) < 0.01 2.5 (1.3,5) < 0.01
Total 1318 100 451 34.2
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 8 of 14
Table 2 Demographic features and risk factors for clustering using 24-locus typing for cases notified in the West Midlands, by the
retrospective method of clustering (Continued)
All cases, 09–11 Clustered cases
N Col % N % OR (95% CI) P aOR (95% CI) p
Previous diagnosis
No 1008 85.2 342 33.9 1
Yes 175 14.8 70 40 1.3 (0.9,1.8) 0.12 – –
Total 1183 100 412 34.8
History of or current problem drug use**
No 969 95.8 321 33.1 1
Yes 42 4.2 33 78.6 7.4 (3.5,15.7) < 0.01 – –
Total 1011 100 354 35
History of or current problem alcohol use**
No 948 96.5 322 34 1
Yes 34 3.5 23 67.7 4.1 (2,8.4) < 0.01 – –
Total 982 100 345 35.1
History of or current homelessness**
No 985 96.8 342 34.7 1
Yes 33 3.2 16 48.5 1.8 (0.9,3.5) 0.11 – –
Total 1018 100 358 35.2
History of or currently in prison**
No 932 95.9 316 33.9 1
Yes 40 4.1 28 70 4.5 (2.3,9.1) < 0.01 – –
Total 972 100 344 35.4
*Foreign-born only
**Missing for the cases notified in 2007 and 2008, and for half of those notified in 2009
Sensitivity analyses Findings by ethnic group were similar using a two and
For most age, ethnic groups and time since arrival in the 3 year window to define retrospective clustering; using a
UK, the percentage retrospectively clustered increased 4 year time window, more retrospective clustering than
slightly as the time window used to identify a matching iso- expected was seen only with pulmonary cases in the
late lengthened, although the confidence intervals widened same ethnic group for the white and Pakistani groups,
(Additional file 1: Figure S1). For 35–44 year olds for ex- and with those in the Mixed/other group with cases in
ample, it increased from 31% (95% CI: 25–38) using a 2 the Black-other group (Additional file 1: Figure S4).
year time window, to 37% (95% CI: 29–45) and 44% (95% Considering the cases by time since arrival in the UK,
CI: 33–55) using a three and a 4 year time window respect- the groups for which the RCR was higher or lower than
ively. However, the age, ethnic group and time since arrival expected were similar for all time windows used for defin-
of the cases with whom cases were retrospectively clustered ing retrospective clustering (Additional file 1: Figure S5).
were similar when the time window used to calculate retro- For a 4 year time window, more retrospective clustering
spective clustering increased (Additional file 1: Figure S2). than expected also occurred among the UK-born or un-
Increasing the time window to 3 years led to more known birthplace and pulmonary cases with an unknown
retrospective clustering than expected occurring only time since arrival (RCR: 1.82 (95% CI: 1.53–2.19) and 5.44
between 5-14 year olds and 15–24 year old pulmonary (95% CI: 4.40–7.22)).
cases (RCR: 2.4 (95% CI: 1.0–3.5)) and between 55-64
year olds and pulmonary cases in the same age group Discussion
(RCR: 2.3 (95% CI: 1.3–4.4)), with more retrospective Our analyses appear to be the first to quantify the
clustering than expected occurring only for the latter amount of clustering between different population
using a 4 year time window (Additional file 1: Figure S3). groups in a high TB incidence area in England using
The age groups for which less clustering than expected molecular data. We found that retrospective clustering
occurred were similar for all time windows (Additional with pulmonary cases between some ethnic groups was
file 1: Figure S3). over two-fold greater than expected, and more
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 9 of 14
Fig. 3 Analysis of the age-groups of the pulmonary cases with whom cases notified during 2009–11 were clustered retrospectively. a Proportions
of cases in each age group who were retrospectively clustered with pulmonary cases in other age groups. b Retrospective clustering ratio for
cases in each age group. Yellow and red cells show less and more retrospective clustering respectively with pulmonary cases in a given age
group than might be expected, with 95% confidence intervals in parentheses. Dashes indicate ratios for which the ratio was zero and confidence intervals
could not be calculated using the bootstrapping approach. Unshaded cells show ratios for which there is neither more nor less retrospective clustering
than might be expected
clustering than expected occurred between 15-24 year clustered. Second, it eliminates some bias that occurs for
olds and between UK-born or long-term immigrants other clustering definitions, such as the “n-1” method,
with the UK-born. The findings provide insight into for which cases notified at different times have different
transmission patterns between different groups and follow-up periods for assessing clustering. Using the
possible ways of prioritising contact tracing in high retrospective method, the same time period for each iso-
incidence areas. late is used to identify its match, and, as suggested by
The definition of clustering used here, i.e. the propor- our analyses (Table 3), the proportion retrospectively
tion of cases who were clustered with pulmonary cases clustered within a given period will probably be rela-
up to 2 years previously, differs from that used in other tively insensitive to the time period spanned by the data-
molecular epidemiological studies in the UK and has set, if there are no changes in the amount of ongoing
two advantages. By definition, cases cannot be retro- transmission.
spectively clustered with cases notified after them, who The size of the bias resulting from differing follow-up
could have been their secondary cases. Consequently, periods for the “n-1” method, and differences between the
the proportion retrospectively clustered is more closely method’s estimates and the proportion retrospectively
related to the proportion of disease that is attributable clustered depends on the study period duration (Table 3).
to recent transmission than is the overall proportion This results from the fact that the denominator used in
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 10 of 14
Fig. 4 Analysis of the ethnic groups of the pulmonary cases with whom cases notified during 2009–11 were clustered retrospectively. a
Proportions of cases in each ethnic group who were retrospectively clustered with pulmonary cases in other ethnic groups. b Retrospective clustering
ratio for cases in each ethnic group, with 95% confidence intervals in parentheses. Dashes indicate ratios for which the ratio was zero and confidence
intervals could not be calculated using the bootstrapping approach. See the caption to Fig. 3 for the interpretation of the colour coding
the percentage clustered for the “n-1” method includes all retrospective clustering ratio widened with the increased
cases notified in the study period, and cases notified early time window, reducing the ability to detect retrospective
in the period but infected 2 years previously would be clustering that is higher than expected. These widening
mistakenly attributed to reactivation. The proportion of confidence intervals follow from the data-loss that oc-
cases affected by the misclassification decreases as the curs with the retrospective clustering approach, which
study period lengthens, as the proportion of cases for increases with longer retrospective time periods consid-
whom it becomes possible to identify a case with a match- ered. For example, as defined here, when calculating the
ing genotype increases. proportion clustered retrospectively, the first 2 years of
Our finding that the proportion retrospectively clus- notified cases were excluded from the denominator, in-
tered increases with the time window used to assess creasing to exclusion of 4 years of notified cases when
clustering is consistent with that from other studies [15], using a 4 year window to define retrospective clustering.
resulting from the increased probability of both the We used the retrospective clustering ratio to estimate
source and secondary case being notified during the whether the clustering seen between two population
study period. However, the confidence intervals on both groups was more or less than that expected, based on
the proportion retrospectively clustered and the the group’s size among notified cases. Analogous
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 11 of 14
Fig. 5 Analysis of the time since arrival of the pulmonary cases with whom cases notified during 2009–11 were clustered retrospectively. a Proportions of
cases with different time since arrival (TSA) who were retrospectively clustered with pulmonary cases with other times since arrival. b
Retrospective clustering ratio for cases with different time since arrivals, with 95% confidence intervals in parentheses. Dashes indicate ratios
for which the ratio was zero and confidence intervals could not be calculated using the bootstrapping approach. See the caption to Fig. 3 for the
interpretation of the colour coding
statistics have been used in social contact surveys to isolate had a confirmed recent epidemiological link
compare the amount of contact between different popu- and could be presumed to have been recently infected
lations. Such statistics may be biased and overestimate than cases whose isolates had been typed (25% vs 18%,
clustering between population groups if strain-typing of P < 0.01).
isolates was done preferentially for certain cases, such as Undersampling of some population groups, such as
those involved in contact investigations. The time required the UK-born (Table 1) for whom genotyping data were
to obtain results from strain-typing data means that available for fewer than half of the cases, could have also
strain-typing is unlikely to have been carried out preferen- affected the retrospective clustering ratios, which con-
tially for cases involved in contact investigations [16]. siders the pulmonary cases with whom cases are clus-
However, we probably underestimated the proportion tered retrospectively and population groups that they
of cases in some population groups that were clustered, come from. If those undersampled cases had pulmonary
as genotyping was only conducted for culture-positive tuberculosis and they transmitted to other population
cases, who comprised 57% of cases during the study groups, the retrospective clustering ratio for the latter
period and sampling a proportion of the data leads to groups with the undersampled groups could be underes-
underestimates in the amount of clustering [17, 18]. timated. The size of the underestimate may be relatively
One study from The Netherlands [19] found that a small, since over half of those without genotype data had
significantly larger proportion of cases without a typed extrapulmonary tuberculosis.
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 12 of 14
Contact tracing seeks to seeks to identify and diagnose Another limitation is that if a case’s infectious source
contacts of infectious cases and is highlighted as a key lived outside the study region or had been notified
component for tuberculosis control by the national tuber- over 2 years before the case of interest, they would not
culosis strategy. The largest impact on case finding will be contribute to calculations of the retrospective cluster-
obtained by focusing on the groups that are likely to have ing proportion.
the highest yield from case-finding. Estimates of the retro- A smaller proportion of extrapulmonary than pulmonary
spective clustering ratio can contribute to this by indicat- cases were retrospectively clustered with pulmonary cases,
ing which population groups may give the highest yield even after adjusting for the birthplace and other factors.
for cases in a given population group. For some ethnic Other studies, which considered the overall proportion of
groups, there was more retrospective clustering than cases that were clustered and, unlike our estimates, had the
expected with pulmonary cases in their own ethnic group, biases described above, had similar findings. Our finding
suggesting that the source of infection, and potentially, the may be attributable to several factors, including undersam-
greatest case-finding yield, may be obtained from contacts pling of extrapulmonary cases, due to the facts that the
in the same ethnic group. Analogous conclusions apply to genotype of culture-negative cases was not determined and
our finding of more retrospective clustering than expected most culture-negative cases are extrapulmonary. Also, due
between 15-24 year olds and pulmonary cases in the same to the non-specificity of symptoms, extrapulmonary cases
age group and between UK-born cases and immigrants are more difficult to diagnose than are pulmonary cases.
who had arrived at least 10 years previously and pulmon- This may lead to increased diagnostic delays among extra-
ary UK-born cases. Prioritising contact tracing for cases in pulmonary cases and reduce the chance of finding their
particular groups on those most likely to cluster retro- source of infection or cases who shared the same genotype
spectively with them could speed up case-finding and, by within the 2 year period for retrospective clustering.
shortening the time during which cases are infectious, im- It is reassuring that our estimates of the proportion of
prove TB control. disease attributable to recent transmission are compar-
More retrospective clustering than expected oc- able to those found elsewhere in Western Europe. Also,
curred between 55-64 year old cases and 5–14 year old our findings of the amount of clustering among immi-
pulmonary cases. Since many 55–64 year old cases grants is consistent with those elsewhere in England.
were probably infected many years previously, this The finding that there was neither more nor less retro-
finding may follow from several study limitations. For spective clustering than expected between recent immi-
example, cases may be retrospectively clustered with grants and other immigrant groups is consistent with
cases who are not their source of infection, since clus- hypotheses that disease among recent immigrants is at-
tering may occur if a common genotype has been cir- tributable to infection acquired abroad. The finding of
culating in the population. Also, since retrospective more retrospective clustering than expected for those
clustering was defined using the notification date as a who had arrived at least 10 years previously and pulmon-
proxy for the onset date, the outcome could have oc- ary UK-born cases suggests that with increasing time
curred if, as is plausible, the time from onset to diag- spent in the UK, acquiring infection from UK-born cases
nosis was shorter for 0–4 than 55 year old cases. becomes increasingly likely.
Table 3 Estimates of the amount of disease attributable to recent transmission calculated using the “n-1” method and retrospective
clustering with cases up to two years beforehand, using all cases notified within different time periods during 2007–11
Time Number Number Number Number % due to recent transmission based on:
period of cases of cases retrospectively of cases
“n-1” formula Retrospective clustering
clustered, notified clustered with with
excluding during cases up to 2 onset
the first the years more
case study previously than
period two
years
after the
start of
the
study
period
2007–11 734 2159 452 1329 34% (32,36) 34% (31,37)
2007–10 554 1721 302 891 32% (30,34) 34% (31,37)
2007–9 393 1291 156 461 30% (28,33) 34% (30,38)
Numbers in parentheses denote (exact binomial) 95% confidence intervals
Only the cases who had onset two or more years after the start of the study period were used in the denominator for the retrospective clustering percentage
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 13 of 14
Additional file 1: This contains the results of the sensitivity of the Received: 7 November 2017 Accepted: 3 December 2018
retrospective clustering analyses to the size of the time window used.
(PDF 291 kb)
References
1. Public Health England. Tuberculosis in England: 2016 report (presenting
Abbreviations
data to end of 2015). London: Public Health England; 2016.
ETS: Enhanced tuberculosis surveillance; LRT: Likelihood ratio test; MIRU-
2. Hamblion EL, Le Menach A, Anderson LF, Lalor MK, Brown T, Abubakar I,
VNTR: Mycobacterial Interspersed Repetitive Unit-Variable Number Tandem
Anderson C, Maguire H, Anderson SR, Public Health England Strain Typing
Repeats; OR: Odds ratio; RCR: Retrospective clustering ratio; TB: Tuberculosis
Project B. Recent TB transmission, clustering and predictors of large clusters
in London, 2010-2012: results from first 3 years of universal MIRU-VNTR
Acknowledgements strain typing. Thorax. 2016;71(8):749–56.
We thank Grace Smith for her input into the molecular typing. 3. Lalor MK, Anderson LF, Hamblion EL, Burkitt A, Davidson JA, Maguire H,
Abubakar I, Thomas HL. Recent household transmission of tuberculosis in
Funding England, 2010-2012: retrospective national cohort study combining
ARK was funded by a PHE-LSHTM PhD studentship when doing this work. RGW epidemiological and molecular strain typing data. BMC Med. 2017;15(1):105.
is funded the UK Medical Research Council (MRC) and the UK Department for 4. Walker TM, Lalor MK, Broda A, Ortega LS, Morgan M, Parker L, Churchill S,
International Development (DFID) under the MRC/DFID Concordat agreement Bennett K, Golubchik T, Giess AP, et al. Assessment of Mycobacterium
that is also part of the EDCTP2 programme supported by the European Union tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen
(MR/P002404/1), the Bill and Melinda Gates Foundation (TB Modelling genome sequences: an observational study. Lancet Respir Med. 2014;2(4):
and Analysis Consortium: OPP1084276/OPP1135288, CORTIS: OPP1137034, 285–92.
Vaccines: OPP1160830) and UNITAID (4214-LSHTM-Sept15; PO 8477–0- 5. Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, Brodkin E, Rempel S,
600). None of the funding bodies played any role in the design of the Moore R, Zhao Y, Holt R, et al. Whole-genome sequencing and social-
study and collection, analysis, interpretation of data or in writing the network analysis of a tuberculosis outbreak. N Engl J Med. 2011;364(8):
manuscript. 730–9.
6. Office for National Statistics: 2011 Census - population and household
estimates for England and Wales. 2012.
Availability of data and materials 7. Gibson A, Brown T, Baker L, Drobniewski F. Can 15-locus mycobacterial
Aggregate data that support the findings of this study are available on reasonable interspersed repetitive unit-variable-number tandem repeat analysis provide
request from the corresponding author (EV). The individual level data from the insight into the evolution of Mycobacterium tuberculosis? Appl Environ
study are not publicly available as the data were collected in adherence with the Microbiol. 2005;71(12):8207–13.
legal framework governing use of confidential personally identifiable information. 8. Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rusch-Gerdes S, Willery E,
Savine E, de Haas P, van Deutekom H, Roring S, et al. Proposal for
Authors’ contributions standardization of optimized mycobacterial interspersed repetitive unit-
EV, AK, RW, JE and IA, conceived and designed the study. EV and AK analysed variable-number tandem repeat typing of Mycobacterium tuberculosis. J
the data. JE, SK conducted the retrospective typing with input from PH. Clin Microbiol. 2006;44(12):4498–510.
AK currently works for the US government. The views expressed in this 9. Evans JT, Smith EG, Banerjee A, Smith RM, Dale J, Innes JA, Hunt D,
paper are those of the authors, and do not necessarily reflect the views Tweddell A, Wood A, Anderson C, et al. Cluster of human tuberculosis
of the U.S. Government. All authors read and approved the final caused by Mycobacterium bovis: evidence for person-to-person
manuscript. transmission in the UK. Lancet. 2007;369(9569):1270–6.
Vynnycky et al. BMC Infectious Diseases (2019) 19:26 Page 14 of 14
10. Ditah IC, Reacher M, Palmer C, Watson JM, Innes J, Kruijshaar ME, Luma HN,
Abubakar I. Monitoring tuberculosis treatment outcome: analysis of national
surveillance data from a clinical perspective. Thorax. 2008;63(5):440–6.
11. Small PM, Hopewell PC, Singh SP, Paz A, Parsonnet J, Ruston DC, Schecter
GF, Daley CL, Schoolnik GK. The epidemiology of tuberculosis in San
Francisco. A population-based study using conventional and molecular
methods. N Engl J Med. 1994;330(24):1703–9.
12. Read JM, Lessler J, Riley S, Wang S, Tan LJ, Kwok KO, Guan Y, Jiang CQ,
Cummings DA. Social mixing patterns in rural and urban areas of southern
China. Proc Biol Sci. 2014;281(1785):20140268.
13. Borgdorff MW, Nagelkerke N, van Soolingen D, de Haas PE, Veen J, van
Embden JD. Analysis of tuberculosis transmission between nationalities in
the Netherlands in the period 1993-1995 using DNA fingerprinting. Am J
Epidemiol. 1998;147(2):187–95.
14. Press WH. Numerical recipes in C : the art of scientific computing, 2nd ed.
Cambridge: Cambridge University Press; 1992.
15. Glynn JR, Crampin AC, Yates MD, Traore H, Mwaungulu FD, Ngwira BM,
Ndlovu R, Drobniewski F, Fine PE. The importance of recent infection with
Mycobacterium tuberculosis in an area with high HIV prevalence: a long-
term molecular epidemiological study in northern Malawi. J Infect Dis. 2005;
192(3):480–7.
16. Mears J, Vynnycky E, Lord J, Borgdorff MW, Cohen T, Crisp D, Innes JA,
Lilley M, Maguire H, McHugh TD, et al. The prospective evaluation of
the TB strain typing service in England: a mixed methods study. Thorax.
2016;71(8):734–41.
17. Glynn JR, Vynnycky E, Fine PE. Influence of sampling on estimates of
clustering and recent transmission of Mycobacterium tuberculosis derived
from DNA fingerprinting techniques. Am J Epidemiol. 1999;149(4):366–71.
18. Murray M. Sampling bias in the molecular epidemiology of tuberculosis.
Emerg Infect Dis. 2002;8(4):363–9.
19. de Vries G, Baars HW, Sebek MM, van Hest NA, Richardus JH. Transmission
classification model to determine place and time of infection of
tuberculosis cases in an urban area. J Clin Microbiol. 2008;46(12):3924–30.
20. Public Health England. England world leaders in the use of whole genome
sequencing to diagnose TB. 2017. https://www.gov.uk/government/news/
england-world-leaders-in-the-use-of-whole-genome-sequencing-to-
diagnose-tb. Accessed 12 Dec 2018.