Mycobacterium Tuberculosis
Mycobacterium Tuberculosis
Mycobacterium Tuberculosis
Abstract
Multiple-locus variable-number tandem repeat analysis (MLVA) is useful to establish transmission routes and sources of
infections for various microorganisms including Mycobacterium tuberculosis complex (MTC). The recently released SITVITWEB
database contains 12-loci Mycobacterial Interspersed Repetitive Units – Variable Number of Tandem DNA Repeats (MIRU-
VNTR) profiles and spoligotype patterns for thousands of MTC strains; it uses MIRU International Types (MIT) and
Spoligotype International Types (SIT) to designate clustered patterns worldwide. Considering existing doubts on the ability
of spoligotyping alone to reveal exact phylogenetic relationships between MTC strains, we developed a MLVA based
classification for MTC genotypic lineages. We studied 6 different subsets of MTC isolates encompassing 7793 strains
worldwide. Minimum spanning trees (MST) were constructed to identify major lineages, and the most common
representative located as a central node was taken as the prototype defining different phylogenetic groups. A total of 7
major lineages with their respective prototypes were identified: Indo-Oceanic/MIT57, East Asian and African Indian/MIT17,
Euro American/MIT116, West African-I/MIT934, West African-II/MIT664, M. bovis/MIT49, M.canettii/MIT60. Further MST
subdivision identified an additional 34 sublineage MIT prototypes. The phylogenetic relationships among the 37 newly
defined MIRU-VNTR lineages were inferred using a classification algorithm based on a bayesian approach. This information
was used to construct an updated phylogenetic and phylogeographic snapshot of worldwide MTC diversity studied both at
the regional, sub-regional, and country level according to the United Nations specifications. We also looked for IS6110
insertional events that are known to modify the results of the spoligotyping in specific circumstances, and showed that a
fair portion of convergence leading to the currently observed bias in phylogenetic classification of strains may be traced
back to the presence of IS6110. These results shed new light on the evolutionary history of the pathogen in relation to the
history of peopling and human migration.
Citation: Hill V, Zozio T, Sadikalay S, Viegas S, Streit E, et al. (2012) MLVA Based Classification of Mycobacterium tuberculosis Complex Lineages for a Robust
Phylogeographic Snapshot of Its Worldwide Molecular Diversity. PLoS ONE 7(9): e41991. doi:10.1371/journal.pone.0041991
Editor: Riccardo Manganelli, University of Padova, Italy
Received April 19, 2012; Accepted June 28, 2012; Published September 11, 2012
Copyright: ß 2012 Hill et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Véronique Hill was awarded a Ph.D. fellowship by the European Social Funds through the Regional Council of Guadeloupe. The project was partially
financed by the International Network of the Pasteur Institutes. However, the funders had no role in study design, data collection and analysis, decision to publish,
or preparation of the manuscript. No additional external funding was received for this study.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: [email protected]
task due to a lack of understanding of the mechanisms behind the MTC genome elsewhere than the DR locus. The phenomenon of
mutations leading to the polymorphism of these genomic targets. ‘‘adjacent deletion’’ in which a contiguous chromosomal segment
Recent studies have shown that phylogenetically unrelated MTC adjacent to the transposon is deleted while the element responsible
strains could be found with the same spoligotype pattern as a result remains intact, was initially described by Roberts et al. [20]. For
of independent mutational events [10], an observation that this purpose, we identified 16 copies of IS6110 in M. tuberculosis
corroborates the fact that spoligotyping is prone to homoplasy to H37Rv genome (reference sequence NC_000962, NCBI genome
a higher extent than the MIRU-VNTRs [11]. Furthermore, database). Note that the 11th copy located in the DR locus was not
spoligotyping has little discriminative power for families associated retained due to the well-known variability of this locus in relation
with the absence of large blocks of spacers, e.g., the Beijing lineage to insertional events (see above), and the fact that it constitutes a
defined by its prototype – spoligotyping international type 1 (SIT1) hotspot for IS6110 insertional preferential locus (ipl; [21]).
in the SpolDB4 database. Consequently, IS6110AD-typing targeted regions adjacent to 15
The usefulness of minisatellite-based lineage classification of IS6110 copies leading to the final amplification of 28 genomic
MTC isolates was attempted by Allix-Béguec et al. [12], who sequences (2 copies of IS6110 were contiguous, with no
described a web-based server with detailed information on a well- amplification between them). Please refer to Tables S1 and S2
characterized set of 186 reference isolates; each strain being for the description of primers used and for IS6110AD-typing and
described for its geographical origin, corresponding genetic experimental conditions, respectively.
lineage, IS6110-RFLP, 24-locus MIRU-VNTR, spoligotyping,
Single Nucleotide Polymorphism (SNPs), and Large Sequence 2. Genotyping information
Polymorphism (LSP) profiles (http://www.MIRU-VNTRplus. This investigation made use of available genotyping data or in-
org). The authors described and tested an algorithm based on house typing of six different subsets of Mycobacterium tuberculosis
best-match analysis followed by tree-based analysis on MIRU- complex (MTC) clinical isolates encompassing 7793 strains of
VNTR data (combined or not with spoligotyping data) to describe diverse geographical origin as follows:
distribution of isolates with minisatellite data among the various
spoligotype families. However, the authors did not interpret their (i) Spoligotyping and 12-loci MIRU-VNTR data on 7009
data to describe minisatellite-based lineages, since conclusions strains from the SITVIT2 proprietary database of Institut
were essentially drawn based on spoligotype-based classification. Pasteur de la Guadeloupe (n = 5990 strains genotyped by
Considering existing doubts on the ability of spoligotyping alone various investigators, list available through http://www.
to reveal exact phylogenetic relationships between MTC strains pasteur-guadeloupe.fr:8081/SITVIT_ONLINE; n = 1019
[11,13], particularly the classification of evolutionary recent strains genotyped at Institut Pasteur de la Guadeloupe as
TbD1–/PGG2/3 group [14]; we decided to study 6 different follows: Guadeloupe n = 203; Martinique n = 88; French
subsets of MTC isolates encompassing 7793 strains (see subsection Guiana n = 364; Dominican Republic n = 88; Colombia
2 of ‘‘Materials and Methods’’ for information on the origin of the n = 134; and Turkey n = 142). This dataset was used to
strains used). The purpose of this paper is to: (i) classify these establish the 12-locus MIRU-VNTR rules, followed by
strains based on 12 locus MIRU-VNTR typing data; (ii) to draw their validation in other datasets described below.
the evolutionary history of various MTC members (species,
(ii) Genotypic data on 176 MTC isolates from the MIRU-
subspecies, groups) leading to the diversity of newly described
VNTRplus database (http://www.miru-vntrplus.org/
phylogenetic lineages/groups; (iii) to see how the geographical
MIRU/index.faces). The aim of this selection (Table S3)
distribution of these lineages reinforces the history of human
was to compare the MLVA based classification of MTC
settlement in the world, and finally, (iv) to evaluate the MLVA
strains developed during this study versus previous labeling
based classification of MTC genotypic lineages as a means to
using SpolDB4 [6] and LSP-based classification [22,23].
provide with an accurate and robust phylogeographic interpreta-
Note that data on M. microti and M. pinnipedii isolates was
tion of its worldwide diversity.
set aside since they were almost inexistent in the subset 1
(no M. microti strains, and only 1 M. pinnipedii among the
Materials and Methods 7009 strains initially used to establish the 12-locus MIRU-
1. Molecular methods VNTR rules).
This investigation made use of available genotyping data of (iii) The MIRU-VNTR rules were further evaluated on a
Mycobacterium tuberculosis complex (MTC) clinical isolates using subset of LAM strains to describe the novel RDrio lineage
standard spoligotyping and MIRU-VNTR typing techniques [24] (Table S4; n = 190). This group was subdivided in 2
[1,3,4], and the reader is referred to subsection 2 below for subgroups: 100 strains with RDrio deletion and 90 wild-
information on the origin of the strains and published data used. type strains.
In selected cases, we further checked for blocks of deleted spacers (iv) To test a hypothesis about an Asia-to-Africa back
in the standard 43-spacer spoligotyping format by extended migration theory based on the study of Y-chromosome
spoligotyping using methodology described earlier [15,16]. For haplogroups at Neolithic times [25], we also used published
this purpose, 2 additional membranes were used to reveal the data on 154 MTC strains from the north west of Iran [26].
presence or absence of spacers 1 to 86 in the genomic order (v) To compensate the lack of MIRU-VNTR data on MTC
established on the M. tuberculosis H37Rv reference strain [17].
isolates from East-Africa in all published genotyping
We also looked for IS6110 insertional events that are known to databases, we decided to type strains from Mozambique.
modify the results of the spoligotyping in specific circumstances; For this purpose, 100 MTC clinical isolates were blindly
briefly, we used pairs of primers (biotinylated)DRa-IS3, (biot)DRb-
sampled starting from an initial set of 445 clinical isolates
IS6, (biot)DRa-IS6 and (biot)DRb-IS3, to highlight the presence of
studied recently using spoligotyping in Mozambique [27].
IS6110 [18,19]. Additionally, an IS6110 Adjacent Deletion
These isolates were typed using 24-loci MIRU-VNTRs,
Typing (IS6110AD-typing) was developed in-house to investigate
extended spoligotyping, the detection of IS6110 insertions
the role of IS6110 insertional event(s) causing deletions in the
Table 1. Description of the 7 major lineages and 41 sublineages based on 12-loci MIRU-VNTRs.
12-loci MIRU-
MIRU-VNTR lineages/ Sublineages/Central VNTR patterns Corresponding
Central Node of Central Node Corresponding Spoligotype Spoligotype rule
node MIT MIT MITs LSP-based lineages lineages (absence of spacers) SIT Number
These lineages/sublineages were identified from a MST tree constructed with 7009 strains taken from the SITVIT2 proprietary database of Institut Pasteur de la
Guadeloupe. The corresponding LSP-based lineages [23] and Spoligotype-based lineages [6] are shown for comparison.
doi:10.1371/journal.pone.0041991.t001
Figure 1. Phylogenetic tree constructed with MrBayes3 software (http://mrbayes.csit.fsu.edu/). The tree is done with the 37 MIRU-VNTR
prototypes of M. tuberculosis sensu stricto.
doi:10.1371/journal.pone.0041991.g001
Table 2. Comparison of the new MIRU-VNTR based lineages with the Brudey’s classification.
This table concerns only two MIRU12-based lineages: Indo-Oceanic and East Asian and African Indian.
doi:10.1371/journal.pone.0041991.t002
Figure 2. MST tree done with 12 MIRU-VNTR loci of 176 strains from MIRU-VNTRplus database (http://www.miru-vntrplus.org/MIRU/
index.faces).
doi:10.1371/journal.pone.0041991.g002
in the DR locus, and IS6110AD-typing as described above stricto using a bayesian approach that is particularly useful to
under the subsection 1. reconfirm MST results [28].
(vi) Lastly, the principle of lineage identification developed was
initially validated on a set of 164 strains typed by 4. Classification algorithm
spoligotyping and 12-loci MIRU-VNTRs from Kerala To describe the classification algorithm, we must first explain
(unpublished data). the principle on which it is based. Take for example a MST done
using 12-locus MIRUs on a set of 164 strains from Kerala, India
(Figure S1). In this figure, the bigger circles surrounding the profile
3. Phylogenetic inferences clusters are drawn according to the spoligotype-based lineage
Phylogenetic inferences were drawn using two applications: classification [5]. This tree also shows the 3 large phylogenetically
BioNumerics (version3.5, Applied Maths, Sint-Marteen-Latem, relevant subdivisions based on katG and gyrA SNP polymorphism
Belgium), and MrBayes3 (available through http://mrbayes.csit. [8], which subdivides M. tuberculosis complex strains into three
fsu.edu/) [28]. BioNumerics (version3.5, Applied Maths, Sint- PGG groupings; PGG1 is considered to be evolutionarily older
Marteen-Latem, Belgium) was used for phylogenetic reconstruc- while PGG3 is the youngest which evolved from PGG2.
tion based on a ‘‘Minimum Spanning Tree’’ (MST) algorithm to Furthermore, ancestral strains are characterized by the presence
draw MSTs on 7009 MTC patterns of the SITVIT2 database. For of a specific deletion region (TbD1) as opposed to modern strains
this purpose, allele strings were imported into a BioNumerics that are TbD1-deleted [9]. Superposition of these groupings
software package and a MST was created based on categorical suggests that PGG1 includes both ancestral (EAI) and modern
and the priority rules (http://www.applied-maths.com/ (CAS, Beijing) lineages, while PGG2/3 include exclusively modern
bionumerics/plugins/mlva.htm) with highest number of single (Haarlem, LAM, T, and X) lineages [22,23]. The MST shows the
locus variants (SLV’s). Following the assumption that evolution central nodes of EAI1-SOM, EAI3-IND, Beijing and CAS (CAS1-
required a minimum of evolutionary events and that all Delhi) lineages corresponding to respectively, MIT64, MIT69,
evolutionary states were present within the dataset studied, one MIT17 and MIT318 (Figure S1). This tree illustrates the fact that
could observe different taxonomic units that were clustered in the all lineage members congregate around a central node. Agglom-
tree generated. In a MST, one considers that the internal nodes eration includes all variants of a lineage while the central node
within a tree are part of the sample, and the branches illustrate represents the most common representative; hence it is the central
agglomerations of variants around their common ancestor. node that generates the most variants within a given lineage.
MrBayes3 was used to infer phylogeny relationships among the Thus classifying strains with MIRU-VNTRs in the present
37 newly defined MIRU-VNTR lineages of M. tuberculosis sensu study amounted to identify (and define) all the central nodes as
prototypes that in turn designated different phylogenetic groups;
6
E 124325153225 1 1 &&&&&&&&&&&&&&%%%%%%%%%%&&&&&&&&%%%%&&&&&&& T5-RUS1 Euro American-190
124326153224 140 2 &&&&&&&&&&&&&&%%%%%%%%%&&&&&&&&&%%%%&&&&&&& T-Tuscany Euro American-190
Fa 215125113322 310 1 &&&&&&&&&&&&&&&&&&&%%%%%&%%&&&&&%%%%&&&&&&& LAM7-TUR Euro American-40
226125113322 430 2 &&&&&&&&&%%%%%%%%%%&&&&&&&&&&&&&%%%%&&&&&&& T3-ETH Euro American-40
G 227425113434 261 1 &&&%%%%&&%&&&&&&&&&%%%%%%%%%%%%%%%%&&&&&&&& CAS1-Kili East-African Indian-261
227225113224 200 2 &&&&&&&%%&&&&&&&&&&%%%%%%%%%%%%&%%%%&&&&&&& H3 East-African Indian-261
H 254326223334 577 1 %&&%%&&&&&&&&&&&&&&&&&&&&&&&%%%%&%&&&&&%&&& EAI1-SOM Indo-Oceanic-69
254326223424 543 2 &%%&&&&&&&&&&&&&&&&&&&&&&&&&%%%%&%&&%%%&&&& EAI3-IND Indo-Oceanic-69
a
As an extended explanation, one may refer to the example of case F, where profile 1 corresponding to MIT310 (LAM7-TUR lineage on the basis of spoligotyping), and profile 2 corresponding to MIT430 (T3-ETH lineage on the
basis of spoligotyping), both correspond to the Euro American-40 sublineage. Indeed, profile 1 with deletion of the block 20–24 and profile 2 with deletion of the block 10–19 could indicate a possible common ancestor with all
spacers in positions 10 to 24 being present. As indicated by our IS6110AD-typing data (see text), this hypothetical ancestor would be harboring a copy of IS6110 between the spacers 19 and 20. Depending on the adjacent
deletion located on the left or the right side of this IS6110 would result in the 2 different spoligotype patterns observed here, i.e., profile 1 or 2. Hence, albeit phylogenetically very close, these 2 isolates would be classified as
LAM7-TUR and T3-ETH, in the Brudey’s classification scheme in SpolDB4.
b
MIT; MIRU International Type according to the SITVITWEB database [5].
doi:10.1371/journal.pone.0041991.t003
MLVA Based M. tuberculosis Lineage Classification
Figure 3. Some explanations on the technique of genotyping for the detection of IS6110. (A) An illustration for understanding the
technique for detection of insertions of IS6110 in the DR locus. (B) Result of genotyping of a strain (ID 1172) taken from a sample of 100 Mozambican
strains. There are 5 distinct genotyping results with each of the primer sets shown; the 1st line shows the classical spoligotyping while the remaining 4
lines show the detection of IS6110 insertional events as detailed in the text. (C) Schematic representation of interpretation of the experiments shown
in Figure 3B. Numbers underlined correspond to the numbering of the spacers in the 43-spacer spoligotyping format, while those not underlined
correspond to the numbering of spacers according to their genomic position in the DR locus. The accolades mark the points of deletion of spacers.
doi:10.1371/journal.pone.0041991.g003
Figure 4. Global geographical distribution of the newly defined MIRU-VNTR lineages. In each subregion the distribution of the
sublineages of the majority lineage is represented.
doi:10.1371/journal.pone.0041991.g004
Figure 5. Two MST phylogenetic trees done with 95 Mozambican strains based on 12-loci MIRU-VNTRs (A), and 24-loci MIRU-VNTRs
(B).
doi:10.1371/journal.pone.0041991.g005
tion of 176 MTC isolates from the MIRU-VNTRplus database codes according to http://en.wikipedia.org/wiki/ISO_3166-
(http://www.miru-vntrplus.org/MIRU/index.faces) by the 1_alpha-3), as well as regional and sub-regional level according
MIRU based lineages versus spoligotyping- and LSP-based to the United Nations (http://unstats.un.org/unsd/methods/
classification schemes is illustrated in Table S3. This kind of m49/m49regin.htm); Regions: AFRI (Africa), AMER (Americas),
approach was particularly useful to label the 7009 reclassified ASIA (Asia), EURO (Europe), and OCE (Oceania), subdivided in:
strains originating from the SITVIT2 database while plotting the E (Eastern), M (Middle), C (Central), N (Northern), S (Southern),
worldwide distribution of newly-described phylogenetical lineages. SE (South-Eastern), and W (Western). In this classification scheme,
CARIB (Caribbean) belongs to Americas, while Oceania is
5. Geographical distribution of newly-described subdivided in 4 sub-regions, AUST (Australasia), MEL (Melane-
sia), MIC (Micronesia), and POLY (Polynesia). Note that Russia
phylogenetical lineages
was attributed a new sub-region by itself (Northern Asia) instead of
The worldwide distribution of newly-described phylogenetical
including it among the rest of Eastern Europe. It reflects its
lineages was studied both at the country level (3 letter country
geographical localization as well as the similarity of specific TB
genotypes circulating in Russia (a majority of Beijing genotypes) 99,101 (i.e., the Beijing sublineages), which corroborates the name
with those prevalent in Central, Eastern and South-Eastern Asia. ‘‘East Asian and African Indian (EAAI)’’ for this newly-defined
large lineage represented by central-node 17.
Results and Discussion In this MIRU classification, we note that the two lineages
‘‘Bov_4-caprae’’ and ‘‘AFRI1’’ as assigned by Brudey are
1. Description of the lineages and sublineages identified compiled in a single phylogenetic group – ‘‘West African lineage
Phylogenetic inferences were drawn from 12-loci MIRU based II’’ (see Table 1, Table S3); this is an interesting observation
MST constructed on all the 7009 MTC patterns taken from the knowing that AFRI1 shares with all of animal MTC pathogens
SITVIT2 database (for which both spoligotyping and 12-loci (including BOV_4-caprae), a number of deletions (RD9, RD7,
MIRU-VNTR data were available; figure not shown since the RD8, RD10), as well as a specific variation of 6 bp of the gene pks
resulting tree was over-crowded). From this tree, we came out with [9]. A unique strain (numbered 9550/00) from the MIRU-
7 major central nodes (or lineages) represented by the following VNTRplus database and classified as West African II according to
MITs: 57, 17, 116, 934, 664, 49, 60 (Table 1). As summarized, Gagneux’s criteria, was reclassified in two distinct lineages: West
lineages with node 57, 17, and 116 were subdivided into 37 sub- African II and M. bovis (see Table S3). A MIRU-based MST tree
nodes as follows. Lineage 57 contained sub-nodes 57, 56, 59, 64, drawn on 176 strains of the MIRU-VNTRplus database (Figure 2)
69; lineage 17 included sub-nodes 17, 16, 83, 86, 93, 99, 101, 68, showed that the three phylogenetic groups – West African I, West
261, the largest lineage 116 contained 23 sub-nodes: 116, 7, 8, 12, African II and M. bovis are phylogenetically close. Considering the
15, 25, 33, 34, 40, 42, 43, 45, 46, 112, 121, 125, 128, 163, 190, fact that the oldest lineages are most distant from the Euro
212, 213, 224, 246. A simplified MST showing the 7 major American lineage, the tree suggests that West African I and West
lineages and 41 sublineages is shown in Figure S2. The African II lineage strains appeared before M. bovis. One may
appropriate nomenclature for these lineages was proposed by therefore speculate that the strain 9550/00 is a phylogenetic
comparing with previous classification schemes proposed in intermediate between these two lineages (West African II and M.
SpolDB4 [6] and by LSP-based classification [22,23] in Table 1; bovis). It is further possible to make other analogies with Brudey’s
see Table S3 for a re-classification of a set of well-characterized classification, especially for M. tuberculosis sensu stricto belonging to
MTC isolates (n = 176 profiles) taken from the MIRU-VNTRplus PGG1 group (Table 2), e.g., 77.14% of Indo-Oceanic-56
online database (http://www.miru-vntrplus.org/MIRU/index. corresponds to EAI2-Manilla, 72.97% of Indo-Oceanic-69 corre-
faces). sponds to EAI3-IND, and 68.60% of Indo-Oceanic-64 corre-
Interestingly, the MIRU-VNTR classification script run on the sponds to EAI1-SOM. However, it is more difficult to make
7009 SITVIT2 dataset strains underlined a good overlap with the similar correspondences among modern PGG2/3 lineages. These
Gagneux’s nomenclature (results not shown), which validates the discrepancies between spoligotype based classification as described
names retained in Table 1. Thus, central-node 57 was named previously and the present insight using MIRU-VNTR based
Indo-Oceanic, 17 as East Asian and African Indian, 116 as Euro- classification would need concerted efforts of wider research
American, 934 as West African I, 664 as West African II, 49 as M. groups in coming years.
bovis, and 60 as M. canettii (Table 1 and Table S3). The sublineages
were named by adding to the name the value of the central sub- 2. Differences observed between spoligotype and MIRU
node, e.g. Indo-Oceanic 57, Indo-Oceanic 56 etc. Note that we based classification schemes
have combined 2 families described by Gagneux (East-Asian and As summarized briefly earlier and in Table 3, the MIRU-based
East-African Indian), in a single large phylogenetic group called classification superimposes quite well with that of Brudey for
‘‘East Asian-African Indian (EAAI)’’. Indeed, during reclassifica- sublineages belonging to PGG1, nonetheless discrepancies do exist
tion of the MIRU-VNTRplus profiles, the patterns of these two for PGG2/3 lineages; 2 broad categories can be cited regarding
lineages were classified in the major node 17 (Table S3). these discrepancies: (i) for cases A, B, C, D and E, where 2 patterns
The above observation was corroborated by the reclassification with a single spacer difference are classified in 2 separate lineages;
of SITVIT strains where both CAS (East-African Indian) and (ii) cases F, G and H have blocks of missing spacers that are
Beijing (East-Asian) strains were reclassified in the node 17; complementary among the 2 patterns. For 1st category, one may
nonetheless, both sublineages occupied distinct MIRU-based sub- consider case C – the pattern 1 (classified as CAS1-Delhi) has 3
nodes as summarized in Table 2. Thus starting from central node blocks of spacers deleted (4 to 7, 23 to 34, and 37 to 38), while
17, Beijing consisted almost exclusively of sub-nodes East Asian- pattern 2 (classified as EAI5) differs from the first only by the
17, East Asian-16, East Asian-83, East Asian-86, East Asian-93, presence of spacer 33. Both these patterns were classified as the
East Asian-99, East Asian-101, while CAS consisted mainly of East African-Indian-68 according to the MIRU-based classifica-
East-African Indian-68 and East-African Indian-261. Further, tion scheme described in this paper. For 2nd category (cases F, G
CAS1-Delhi and CAS1-Kili profiles according to Brudey’s and H), one may notice that blocks of spacers deleted in a given
classification are concentrated in East-African Indian-68 and in profile are contiguous to those verified in the other profile, e.g.,
East-African Indian-261 sub-nodes, respectively. To validate the pattern 1 in case F is characterized by a loss of spacers 20 to 24, 26
fact that the two phylogenetic groups of East Asian and East- to 27 and 33 to 36 (classified as LAM7-TUR), while pattern 2 has
African Indian in Gagneux’s classification scheme form a single 2 blocks of missing spacers; 10 to 19 and 33 to 36 (classified as T3-
big group, we used a Bayesian tree with the 37 core MIRU-VNTR ETH).
profiles retained within M. tuberculosis sensu stricto (Figure 1). In It is important to recall that classical spoligotyping method
this figure, one can observe the three major phylogenetic groups (i) which uses 43 spacers out of 104 reported spacers in tubercle
PGG1/TbD1+ (ii) PGG1/TbD1- and (iii) PGG2–3/TbD1- bacilli [17], may not systematically reflect the succession and exact
(shown in red, yellow, and blue colors) which clearly regroup order of spacers on the genome, e.g., if the spacer block ‘‘20 to 24’’
Indo-Oceanic, East-Asian/East-African Indian, and Euro-Amer- of pattern 1 is indeed adjacent to the block ‘‘10 to 19’’ in pattern 2
ican lineages, respectively. In the middle of this tree, East-African (Table 3). We therefore thought it desirable to have a: (i) finer view
Indian-68 and East-African Indian-261 (the two CAS sublineages) of the DR locus using extended spoligotyping [15,16], (ii) to detect
share a central node with East-Asian sub-nodes 17, 16, 83, 86, 93, IS6110 insertions in the DR locus using methodology described
earlier [18,19], (iii) use IS6110AD-typing to investigate the role of in our laboratory for almost 18 years. This observation was
IS6110 insertional event(s) causing deletions in the MTC genome indirectly corroborated by the fact that we also observed an
elsewhere than the DR locus. All these three techniques were used additional deletion in the spoligotype pattern of this H37Rv strain
on a same set of 100 MTC isolates blindly sampled from an initial (Figure S3); indeed the strain in our case lost spacer 15 (in addition
set of 445 clinical isolates studied in Mozambique [27]. The results to the characteristic H37Rv pattern defined only by the absence of
obtained for selectected isolates are summarized in Figure 3 and spacers 20 to 21 and 33 to 36), although its MIRU-VNTR pattern
Figure S3 for spacers 1 to 86 shown in sequential order, for the remained unchanged. Considering that the test isolates were not
localization of IS6110 insertions in the DR locus; and in Table S5 repeatedly subcultured, we presume that similar deletions did not
for IS6110 AD-typing. occur during the time of the study.
Regarding the demonstration of the IS6110 in the DR locus One may postulate that the high IS6110 copy number in the
(Figure 3A), hybridization of a spacer by the primer sets H37Rv genome (16 copies) conferred a high mutation rate to the
(biot)DRa-IS3 or (biot)DRb-IS6 is positive evidence for IS6110 DR locus, since the latter is know to be an IS6110 preferential
insertion in the DR preceding the spacer in question in 59R39 locus (ipl ; [21]). However, mechanisms other than IS6110
direction, while with primer sets (biot)DRa-IS6 or (biot)DRb-IS3, insertion have been suggested to cause the loss of spacers in the
it is an evidence for insertion in the direction 39R59. Nonetheless, DR locus – which is a member of the Clustered Regularly
asymmetrical insertion of IS6110 in the DR can prevent the Interspaced Short Palindromic Repeats (CRISPR) – such as
binding of one of the two primers and affect the amplification of homologous recombination between DR [37] or IS6110 [33], and
the upstream or downstream spacer. Hence, we amplified the slippage during DNA replication [38]. In a recent study, different
spacers both on the right and left of the DR repeats to evidence spoligotypes observed among epidemiologically related strains
IS6110 insertions; indeed these four pairs of primers are expected were attributed to the loss of spacer blocks due to recombination
to produce an amplicon containing only a single spacer as shown between DRs, an event favored by the formation of a secondary
in Figure 3A. The results obtained for the 86 extended spacers are structure involving two IS6110 in opposite orientation [31], an
summarized for a strain in Figure 3B (detailed results on 10 explanation that argues in favor of more complex and interlinked
selected isolates from Mozambique are shown in Figure S3): 1st way of MTC evolution involving 2 or more mechanisms
line corresponds to use of classical spoligotyping primers DRa- simultaneously. In conclusion, insertion sequences undoubtedly
Drb, while the 4 other lines correspond respectively to primer sets: induce adjacent deletions [20], and no matter the mechanism, the
(biot)DRa-IS3, (biot)DRb-IS6, (biot)DRa-IS6 and (biot)DRb-IS3, fact that IS6110 are observed next to deleted spacers on the DR
and are helpful to highlight the presence of IS6110 element(s) in locus underlines their active involvement in DR evolution by loss
the DR locus. As shown in Figure 3B and Figure S3, the presence of spacers.
of IS6110 often results in revelation of 1 or 2 adjacent spacers In conclusion, the discrepancies observed between spoligotype
leading to 2 possible assumptions: (i) either there are several and MIRU based classification schemes in the cases cited above
IS6110 inserted into contiguous DR, or (ii) part of the amplicon underline that MIRU-based classification tends to group MTC
carried by the IS6110 had length variations (since transposable isolates that are phylogenetically close or almost similar albeit they
elements are know sometimes to carry pieces of genomic might appear distant if only judged based on their spoligotyping
sequences; [29]). The results for strain 1172 reveal several
patterns. For example, going back to the Table 3 (case F), where
IS6110 in its DR locus, and they often occupy a position adjacent
the profile 1 presents the deletion of the block 20–24 (in classical
to the spacer blocks Figure 3B). Indeed, this strain presents several
43-spacer numbering), and profile 2 a deletion of the block 10–19.
losses of spacer blocks: 4 to 11, 16, 32 to 33, 43 to 50, 54 to 61, 67
If these 2 profiles shared a common ancestor, it would have all the
to 78, 80 to 84, and 86. This interpretation is schematized in
spacers in positions 10 to 24 present, and in addition would harbor
Figure 3C, and underlines duplication of spacers in the 39R59
a copy of IS6110 in the DR located between the spacers 19 and
direction, e.g., for genomic positions 4 and 34, and corroborates
20. Thus depending on the adjacent deletion located on the left or
previous reports [15,30]. Thus, this DR locus would present 10
the right side of this IS6110 would result in totally different
insertions of IS6110 in the following locations: DR2 (located
spoligotype patterns that would be classified in 2 distinct
upstream of the spacer 2), DR4, DR12, DR17, DR23, DR27,
sublineages according to SpolDB4 classification (classified as
DR29, DR31, DR34 and DR35. The presence of IS6110 in the
LAM7-TUR and T3-ETH, respectively), albeit phylogenetically
DR35 was already reported [31]. It is interesting to note that the
insertions in DR4, DR12, DR17, DR31 and DR34 are adjacent to very close. Hence, the MIRU-based classification scheme that
the absence of spacers 4 to 11, 16, and 32 to 33. groups these 2 spoligotypes together is appropriate.
In the context of adjacent deletions, the potential role of The Euro American phylogenetic group of Gagneux that
homologous recombination between two IS6110 insertions was groups TbD1-/PGG2/3 spoligotype-defined lineages (Haarlem,
underlined for the RvD2 deletion and disruption of the plcD gene LAM, X, S, and T), as well as a wide range of unclassified
in M. tuberculosis [32]. Indeed, the IS6110-associated deletion spoligotype profiles in the recent SITVITWEB version of the
hypervariability is today considered an important driving force in international database [5], is characterized by the presence of a
M. tuberculosis genome evolution [33]. As illustrated in Table S6, high number of IS6110 copies. The large copy number of IS6110
several Regions of Difference (RD) are reportedly located next to in these modern strains produces many variations in the DR locus,
IS6110, e.g., RD152, RD207, RD5, RD11, RD14, MiD2 making it difficult to study their evolution uniquely on the basis of
[34,35,36]. To determine whether the IS6110 was involved in their spoligotype profile. Further, asymmetrical IS6110 insertional
genetic recombination that may cause adjacent deletions [20], we events could also lead to 2 patterns differing by a single spacer
applied IS6110 AD-typing to selected strains from Mozambique change [18,19], and falsely lead to their inclusion in two different
and M. tuberculosis H37Rv. The results obtained underlined lineages based on certain SpolDB4 lineages. On the contrary,
deletions adjacent to IS6110 insertions (Table S5). Unexpectedly, TbD1+/PGG1 ancestral EAI lineage harbors little or no IS6110,
we also observed deletions in M. tuberculosis H37Rv not reported in which explains a good concordance between spoligotype and
the original H37Rv sequence on the NCBI server; these deletions MIRU-based classification schemes.
probably occurred during successive subcultures of the type strain
3. Global geographical distribution of new MIRU-VNTR strains that predominate in Asia (Figure 4). The distribution of
lineages sublineages showed a high proportion of East Asian-17 sublineage
3.1. The global distribution map of the MIRU-VNTR in ASIA-E region (43.70%), followed by East Asian-16 in ASIA-C
lineages. The global geographical distribution of the newly (74.68%) and ASIA-N (68.4%). Considering the two types of
defined MIRU-VNTR lineages is summarized in Figure 4. The Beijing lineages in Asia; a 1st type being characterized by the
map drawn illustrates the information available in the SITVIT2 presence of a NTF region without IS6110 insertion while the 2nd
database for the 6800 MTC isolates recognized as M. tuberculosis type presents an IS6110 in the NTF locus [43] – the former could
sensu stricto. The figure shows pie charts with two circles – the correspond to East Asian-16 sublineage while the latter would
inner circle shows the three most predominant newly-described correspond to East Asia-17. Indeed, a study based on human
lineages, i.e., Indo Oceanic, East Asian and African Indian phylogeography hypothesized that the 1st type emerged in the
(EAAI), Euro American, whereas the outer circle shows the upper Paleolithic period in Central Asia among the NRY K-M9
sublineages belonging to uniquely the most predominant of the haplogroup coming from the Middle East [43]. The geographical
three lineages (please refer to the color scheme shown in the legend location of different descendant haplogroups suggests that the
to Figure 4). The exception being the region corresponding to East migration route would then concern the North East (to Siberia)
Africa for which both the lineages Euro American and Indo and the South East (northern China). This Paleolithic Beijing
Oceanic were almost equally represented (almost 50% of strains). which prevails in central Asia and north Asia, superimposes with
Note that we chose to illustrate the distribution of Indo Oceanic the geographical distribution of the East Asian-16 sublineage.
sublineages since this lineage followed a distribution gradient from Similarly, the 2nd Beijing type would have emerged in the
South-East Asia to East Africa for regions bordering the Indian Neolithic period among Proto-Sino-Tibetan farmers in East Asia
Ocean (see below). Thus the outer circles show the distribution of (Haplogroup O-M214/M122) followed by its spread to the rest of
following sublineages: Indo Oceanic in AFRI-E, ASIA-SE and East Asia [43], which coincides well with the predominance of
ASIA-S; East Asian and African Indian (EAAI) in AFRI-E/ASIA- East Asian-17 sublineage over the same geographical area. Further
E, ASIA-C and ASIA-N; Euro American in all other subregions studies will be needed to investigate if both Beijing differentiated
essentially in Europe and Americas. Briefly, one may conclude by Mokrousov et al. [43] are blended with East Asia-16 and East
that the Indo Oceanic lineage is widely represented in AFRI-E Asia-17 sublineages as suggested by our distribution map (Figure 4).
(42.11%), ASIA-S (68.31%), and ASIA-SE (100%); East Asian and 3.4. The Euro American lineage was probably first spread
African Indian (EAAI) in ASIA-C (84.44%), ASIA-E (80%), and to Europe through several human migrations from Middle
ASIA-N (80.59%); and Euro American lineage in all other sub- East: the Asia-to-Africa back migration theory. Considering
regions, e.g., it represents 68.31% of TB cases listed in AMER-N, that the Indo-Oceanic (EAI in SpolDB4) lineage is the most
96.15% in AMER-S, 89% in CARI, 64.25% in EURO-N, ancestral [9], the Euro American lineage is the latest to emerge
85.71% in EURO-W, 94.53% in EURO-S, 99.26% in EURO-E, according to the Bayesian tree (Figure 1). On the map shown in
98.43% in ASIA-W; 95.45% in AFRI-W, and 100% in AFRI-M. Figure 4, this lineage is predominant in Europe and America
3.2. Out of Africa scenario: Indo Oceanic lineage. Seeing which largely justifies its name. According to the Bayesian tree,
the phylogeographical specificity of the Indo Oceanic lineage Euro-American-40 was the first to emerge among subfamilies
(which according to the Bayesian tree is the more ancient among belonging to the PGG2/3 group (Figure 1); considering that it is
the three major phylogenetic groups) for regions bordering the also highly predominant in western Asia (with 37.6% of the
Indian Ocean, it seems to have originated on the east coast of PGG2/3 strains, Figure 4), we suggest that the ancestor of all
Africa. Indeed, the Indo Oceanic-57 sublineage, found in strong modern MTC strains probably originated in this sub-region.
proportion in East Africa followed by India and South-East Asia Furthermore, although the distribution of Euro American
could be considered as the central node of the Indo Oceanic sublineages in various regions is quite heterogeneous; this is not
lineage (Figure 1). According to this tree, Indo Oceanic-69 the same for middle and western Africa, where the Euro
(prevalent in India) and Indo Oceanic-56 sublineages (predomi- American-12 sublineage predominates (100% and 90.48%,
nant in South-East Asia) share a close common ancestor. respectively, of all modern strains). To better understand how
According to the length of tree branches, Indo Oceanic-69 modern strains are found in Africa in such proportions, one may
sublineage apparently diverged from this common ancestor before refer to the trajectory of R1b haplogroup (Y-chromosome). R1b is
the Indo Oceanic-56 lineage (Figure 1). These observations suggest most frequently found in western Europe, parts of central Eurasia
a human migration from the East of Africa to South-East Asia and in parts of sub-Saharan and central Africa, e.g., around Chad
(Pacific Islands) via India. Further, the global geographical and Cameroon (http://en.wikipedia.org/wiki/
distribution of M. tuberculosis (lineages) sensu stricto underlines Haplogroup_R1b_(Y-DNA)#Origin_and_dispersal). The point
that this migration would not have affected the North of the of origin of this haplogroup is thought to lie in Eurasia, most
Middle East. In this context, it might be worthwhile to mention likely in western Asia [44].
that almost all the strains belonging to the Indo Oceanic lineage in We also attempted to explain the present distribution gradient
Middle East are concentrated in its south, specifically Saudi of the Euro American lineage on the basis of an Asia-to-Africa
Arabia. This pattern of M. tuberculosis evolution and migration back migration theory; indeed, Cruciani et al. [25] underlined an
through its human host is corroborated by studies based on human unusual Asia-to-Africa back migration at Neolithic time through
mitochondrial DNA (mtDNA) showing a first migration route out the study of Y-chromosome haplogroups. In an attempt to test this
of Horn of Africa [39]; the migrants successively joined the hypothesis (Asia-to-Africa back migration) with the information
Arabian coast and Persia [39,40], followed by India and Thailand, contained in the 12 loci MIRU-VNTR of M. tuberculosis strains, we
Indonesia and Australia [39,41,42] – a migration dated back to an classified 154 published strains from the north west of Iran [26]
interval ranging from 80,000 to 60,000 years. with our new classification algorithm. The first three lineages that
3.3. The Asian continent, place for the East Asian and predominate in this region are Euro American-212 sublineage
African Indian (EAAI) lineage expansion. In the Bayesian with 22.8%, M. bovis lineage with 21.43% and Euro American-121
tree (Figure 1), the central place is occupied by the East Asian and sublineage with 11.69% (data not shown). Considering that Euro
African Indian (EAAI) lineage characterized by TbD1-/PGG1 American-121 sublineage contains African strains (like the Euro
American-12 sublineage, Table S3), the reclassification of strains sified 190 published strains from Rio de Janeiro [24]. As
taken from the MIRU-VNTRplus database further underlined the summarized in Table S4, the study sample contained wild-type
fact that Euro American-121 sublineage included strains belonging strains (n = 90), strains with RDrio deletion (n = 93), and interna-
to the Uganda II and Ghana spoligotype families, while Euro tional reference strains harboring the RDrio deletion (n = 7). The
American-12 sublineage included mainly strains of the Cameroon results obtained showed that of the 100 strains with RDrio deletion:
family (Table S3). The phylogenetic tree in Figure 2 shows that (i) the majority (95%) belong to the 2nd phylogenetic group shown
Euro American-121 and Euro American-12 sublineages are close above (sub-group B LAM-25/128/163/224) (ii) A minority (3%)
and that the former would be older than the latter. The high belonged to sub-group A LAM-190/213/246; (iii) and 2 were not
prevalence of Euro-American-121 sublineage strains in Iran, and part of either of the LAM phylogenetic groups. On the contrary,
that of Euro-American-12 sublineage strains in central and the distribution of the wild-type strains (n = 90) was different with a
western Africa also confirms the assumption regarding Asia-to- majority of the sub-group A strains as follows: (i) the majority (70/
Africa back migration of the Euro American lineage. 90 or 77.8%) belonged to the sub-group A LAM-190/213/246; (ii)
In the sample of MTC isolates from the north west of Iran, we A minority (11/90 or 12.2%) belonged to the sub-group B (LAM-
observed a high proportion of Euro American-212; reclassification 25/128/163/224); (iii) and 9 were not part of either of the LAM
of MIRU-VNTRplus strains (Table S3) showed that this subline- phylogenetic groups. These results may be further interpreted
age exclusively corresponds to the S sublineage in SpolDB4 [6] (an based on the Figure 1, where LAM sub-groups A (Euro American
observation also confirmed by classification of SITVIT strains), sublineages 190/213/246) and B (Auro American sublineages 25/
with reported phylogeographical specificity to Sicily and Sardinia 128/163/224) appear to have apparently diverged from a
[45]. The high prevalence of this lineage in the north west of Iran common LAM ancestor that they share in common with the
allows us to speculate that it may have originated in the Middle sublineage 213. One may therefore conclude that the LAM
East and reached the mediterranean coast by migrants at Neolithic ancestor initially had an intact RDrio region, from which diverged
period, harboring the R1b haplogroup [44]. It is therefore clear Euro American-213 and the predecessor of Euro American-190
that the prevalence of the Euro American lineage in Europe and and -246 sublineages (various group A strains that are character-
America cannot be explained solely on the basis of recent ized by an intact RDrio region). Later, the loss of the RDrio region
European colonization but also due to first human migrations in constituted the phylogenetic sub-group B (LAM-25/128/163/
America through Bering Strait from Asia about 20,000 years ago 224). As summarized in Table S7, the subgroup A which is found
(atlas of human journey: https://genographic.nationalgeographic. in Southern Africa (78.27%) and North of Africa (68%), is more
com/genographic/lan/en/atlas.html). We note that outside Asia, ancestral than the subgroup B which is well represented in
East African Indian-68 sublineage is predominant among modern Caribbean (89.5%) and south & west Europe (58.8% and 60.3%
TbD1-/PGG1 strains in some subregions, e.g., Northern Europe respectively).
(37.71%), Western Europe (31.51%) and Northern Africa while 3.7. The ability of 12-loci versus 24-loci MIRU-VNTRs to
East-Asian-17 is predominant in North America (43.13% among discriminate MTC sublineages. To answer this question, we
EAAI) strains and further concentrates most of the EAAI strains in constructed 2 MST phylogenetic trees with 95 strains of
central America. Mozambique (Figure 5), which essentially contained 2 main
3.5. Identification of two major phylogenetic groups lineages – Indo-Oceanic (42.1%) and Euro American (54.7%).
among Euro American lineage. The Bayesian tree in Irrespective of the typing format used (12-loci, Figure 5A vs. 24-
Figure 1 shows many sub-nodes each with a distinct sublineage; loci, Figure 5B), none of the trees showed a strong link between
nonetheless 2 sub-nodes are slightly more distal and lead to these two main lineages. Almost the totality of Euro American
secondary branching leading to two additional phylogenetic sub- strains (84.6%) belonged to the LAM phylogenetic sub-group B
branches within the Euro American lineage: (i) a 1st group with (essentially sublineages Euro American-163 and Euro American-
sublineages 45, 43, 42 (ii) a 2nd group with sublineages 213, 190, 128). Regardless of the typing format used, the trees showed the
246, 25, 163, 224, 128. To name these two sub-groups, we did an same two big clusters (even though the tree made with 24-loci had
analogy with the SpolDB4 lineages as updated recently in the much more ramifications). We therefore conclude that 12-loci
SITVITWEB [5]. We observed that 92% of Haarlem lineage format is sufficient to discriminate the present MIRU-VNTR
strains correspond to the 1st group, hence it was renamed as based MTC lineages.
Haarlem-42/43/45. Further, 74.33% of LAM and 100% T5-
RUS1 strains were found in the 2nd group; considering that T5- 4. Concluding remarks
RUS1 was recently reclassified as LAM on the basis of specific This paper provides new information on the MTC genotypic
SNPs [14,46,47], the 2nd group was renamed as LAM-25/128/ polymorphism based on widely used markers, i.e., IS6110, the DR
163/190/213/224/246. The worldwide distribution of these 2 locus, the LSPs and MIRU-VNTR minisatellites. The genotypic
sub-groups is summarized in Table S7; the Haarlem-42/43/45 classification of MTC was until now based on SNPs [48], LSPs
phylogenetic group is well represented in Europe, mainly in the [22,23], and spoligotyping [5,6]. Although spoligotyping-based
south of Europe and South America, as well as in North Africa; classification was more discriminative than the LSP-based
whereas the LAM-25/128/163/190/213/224/246 phylogenetic classification, it was recently singled out as subject to convergent
group – subdivided in subgroups A (LAM-190/213/246) and B evolution of the DR locus [14]. In this regard, although the
(LAM-25/128/163/224) – is well distributed everywhere in MIRU-VNTR typing has been massively used for MTC
Europe, Africa and America, except in EURO-S, ASIA-W, molecular typing in recent years, its use for purely phylogenetical
AFRI-M, and AFRI-W. The fact that Euro American-213 (at the classification of MTC was not investigated at a large scale.
base of the sub-node and supposedly more ancient than other By using the MST method in conjunction with a Bayesian
terminal sublineages in the 2nd group) is prevalent in the North of approach in this investigation, we describe a 12-loci MIRU
Africa may suggest its emergence in this subregion. scheme for MTC classification. This study also showed evidence
3.6. Tentative identification of LAM strains harboring the for the satisfactory ability of 12-loci MIRUs to discriminate MTC
RDrio deletion among the LAM-25/128/163/190/213/224/ sublineages versus 24-loci format. In light of the information
246 phylogenetic group. To answer this question, we reclas- provided herein, the genotypic classification of MTC lineages
based until now on spoligotyping and LSPs is now rendered more BioNumerics software, and illustrates the fact that all lineage
accurate thanks to MIRU-VNTR minisatellites. We therefore members congregate around a central node. The tree illustrates
recommend that future investigations using MIRU-based typing of MIRU based subdivisions concomitantly with other phylogenet-
M. tuberculosis refer to the present classification for lineage ically relevant markers: (i) katG-gyrA polymorphism based three
attribution in addition to existing spoligotyping and/or MIRU principal genetic groups (PGG); (ii) spoligotype-based lineages; and
based systems. Indeed, seeing the complex sublineage names in the (iii) presence of a specific deletion region (TbD1).
present nomenclature, a time of adaptation might be necessary for (PDF)
many of the users (or databases), already providing with a lineage
Figure S2 A Minimum Spanning Tree (MST) construct-
attribution.
ed on MIRU-VNTR prototype MITs defining the newly
Comparison of this new classification to that of Gagneux,
described sublineages. Please refer to the text for further
demonstrated that (i) the Indo-Oceanic lineage is divided into five
details.
phylogenetic subgroups, (ii) the East- African Indian and East
(PDF)
Asian lineages form one large group which is subdivided into nine
phylogenetic subgroups, (ii) the Euro American lineage contains Figure S3 Result of genotyping on 10 MTC strains
twenty-three subgroups, and that (ii) the West African II lineage (selected from a sample of 100 Mozambican strains).
includes the BOV_4-CAPRAE sublineage [6]. In general, There are 5 distinct genotyping results with each of the primer sets
phylogenetic PGG1 sublineages of Brudey, find a match in this shown; the 1st line shows the classical spoligotyping while the
new classification, which is not always the case for PGG2/3 remaining 4 lines show the detection of IS6110 insertional events
spoligotype sublineages [6]. For instance, both the Haarlem and as detailed in the text.
LAM groups and subgroups were correctly identified in our new (PDF)
classification scheme. Furthermore, within the LAM family,
Table S1 Description of primers used for amplification
RD(Rio) sublineages could be identified. Often, discrepancies
of sequences adjacent to IS6110 present in M. tubercu-
observed between the 12-loci MIRU based classification and the
losis H37Rv. IR-r (Inverted Repeat Right) refers to the inverted
spoligotyping methods were resolved, since MLVA-based classi-
fication tends to group MTC isolates that are phylogenetically repeat sequence that frames the IS6110 in the 59 side. IR-l
close or almost similar albeit they sometimes appeared distant if (Inverted Repeat Left) refers to the inverted repeat sequence that
judged solely based on their spoligotyping patterns. Indeed, these frames the IS6110 in the 39 side. The amplicon name comprises
supposedly distant spoligotype patterns arose due to IS6110 the ID of the IS6110 followed by the symbol ‘‘–‘‘ then the letter
insertional events that could be implicated in loss of DR locus ‘‘r’’ (right) for amplicon located in the 59 or ‘‘l’’ (left) for the 39 side.
spacers. Thus, our results also underlined the role of transposable (PDF)
elements in chromosomal rearrangements, since there is a direct Table S2 (A) Protocol for the MIX preparation of each
link between the large number of IS6110 elements found in the IS6110AD-typing multiplex. (B) Program cycles used. The process
DR locus and deletions of DR spacers causing the bulk of from second to fourth cycle (2*, 3*, 4*) was repeated 35 times.
polymorphism occurring in this genomic region. Hence even if (PDF)
much of the IS6110 transpositional events may not be traced as
being directly involved in convergent evolution of MTC strains, a Table S3 Reclassification of 176 profiles taken from the
fair portion of convergence leading to the currently observed bias MIRU-VNTRplus database (http://www.miru-vntrplus.org/
in phylogenetic classification of strains may be traced back to the MIRU/index.faces).
presence of IS6110. Besides, our results suggest that IS6110 may (PDF)
be implied in a fraction of the LSP deletions, and may therefore Table S4 Reclassification of 190 published strains [24].
play a role in the high level of MTC genomic plasticity conferring This collection gathers data from Rio de Janeiro with RDrio
its adaptation to a wide variety of hosts and environment. deletion (n = 93 strains), international strains with RDrio deletion
In our opinion, MTC strains having a high number of IS6110 (n = 7), and other Rio de Janeiro strains which contained the RDrio
elements such as those belonging to the Euro American lineage, sequence (n = 90).
would highly benefit from MIRU-VNTR typing to assign a (PDF)
phylogenetic position translating evolutionary reality. The novel
MIRU-VNTR based classification scheme presented in the Table S5 Results of IS6110AD-typing performed on 10
present investigation seems to be a good alternative to support Mozambican strains. Filled square symbolize intact region,
future phylogenetic and epidemiologic studies. Considering its cost empty squares symbolize regions deleted. An asterisk (*) is added
effectiveness and simplicity, the MIRU-VNTR typing in conjunc- when the region size has about 50 bp less than the expected size.
tion with the present MTC classification scheme is equally (PDF)
appropriate both for developed and emerging nations concerned Table S6 Description of selected regions of difference
by tuberculosis. Last but not least, the results presented herein on a (RD) located in an adjacent position to an IS6110. The first
first worldwide phylogeographic snapshot of MTC diversity and column gives the name of the locus of the two transposases of the
evolution as judged by their MIRU-VNTR profiles shed new light concerned IS6110. The second column gives the RD that is
on the evolutionary history of the pathogen in relation to the adjacent to IS6110 insertion. The third column lists of position of
history of peopling and human migration. gene(s) involved in the deletion.
(PDF)
Supporting Information
Table S7 Distribution of phylogenetic groups in the
Figure S1 MIRU-based minimum spanning tree (MST) various sub-regions of the world. Percentage of a given
constructed on 164 M. tuberculosis isolates from Kerala, group among PGG2/3 isolate is reported in each subregion. (A)
India (unpublished results, see acknowledgments sec- Distribution of Haarlem-42/43/45 (B) Distribution of LAM-25/
tion for origin of data). This tree was made using the 128/163/190/213/224/246 group and these two subgroups
LAM-190/213/246 (subgroup A) and LAM-25/128/163/224 India), for their permission to use data to draw the Minimum Spanning
(subgroup B). Tree shown in Figure S1.
(PDF)
Author Contributions
Acknowledgments Conceived and designed the experiments: VH TZ NR. Performed the
experiments: VH SS. Analyzed the data: VH TZ NR. Contributed
The authors are highly grateful to Biljo V Joseph and Sathish Mundayoor reagents/materials/analysis tools: VH TZ SV ES GK NR. Wrote the
(Mycobacteria Research Group, Department of Molecular Microbiology, paper: VH TZ NR. Obtained permission to reproduce the Minimum
Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, Spanning Tree shown in Figure S1: NR.
References
1. Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, et al. 21. Fang Z, Forbes KJ (1997) A Mycobacterium tuberculosis IS6110 preferential locus
(1997) Simultaneous detection and strain differentiation of Mycobacterium (ipl) for insertion into the genome. J Clin Microbiol 35: 479–481.
tuberculosis for diagnosis and epidemiology. J Clin Microbiol 35: 907–914. 22. Gagneux S, DeRiemer K, Van T, Kato-Maeda M, de Jong BC, et al. (2006)
2. Mazars E, Lesjean S, Banuls AL, Gilbert M, Vincent V, et al. (2001) High- Variable host-pathogen compatibility in Mycobacterium tuberculosis. Proc Natl Acad
resolution minisatellite-based typing as a portable approach to global analysis of Sci U S A 103: 2869–2873.
Mycobacterium tuberculosis molecular epidemiology. Proc Natl Acad Sci U S A 98: 23. Gagneux S, Small PM (2007) Global phylogeography of Mycobacterium tuberculosis
1901–1906. and implications for tuberculosis product development. Lancet Infect Dis 7:
3. Supply P, Lesjean S, Savine E, Kremer K, van Soolingen D, et al. (2001) 328–337.
Automated high-throughput genotyping for study of global epidemiology of 24. Lazzarini LC, Huard RC, Boechat NL, Gomes HM, Oelemann MC, et al.
Mycobacterium tuberculosis based on mycobacterial interspersed repetitive units. (2007) Discovery of a novel Mycobacterium tuberculosis lineage that is a major cause
J Clin Microbiol 39: 3563–3571. of tuberculosis in Rio de Janeiro, Brazil. J Clin Microbiol 45: 3891–3902.
4. Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rüsch-Gerdes S, et al. 25. Cruciani F, Trombetta B, Sellitto D, Massaia A, Destro-Bisol G, et al. (2010)
(2006) Proposal for standardization of optimized mycobacterial interspersed Human Y chromosome haplogroup R-V88: a paternal genetic record of early
repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. mid Holocene trans-Saharan connections and the spread of Chadic languages.
J Clin Microbiol 44: 4498–4510. Eur J Hum Genet 18: 800–807.
5. Demay C, Liens B, Burguière T, Hill V, Couvin D, et al. (2012) SITVITWEB – 26. Asgharzadeh M, Kafil HS, Roudsary AA, Hanifi GR, et al. (2011) Tuberculosis
a publicly available international multimarker database for studying Mycobacte- transmission in Northwest of Iran: using MIRU-VNTR, ETR-VNTR and
rium tuberculosis genetic diversity and molecular epidemiology. Infect Genet Evol IS6110-RFLP methods. Infect Genet Evol 11: 124–131.
12: 755–766.
27. Viegas SO, Machado A, Groenheit R, Ghebremichael S, Pennhag A, et al.
6. Brudey K, Driscoll JR, Rigouts L, Prodinger WM, Gori A, et al. (2006)
(2010) Molecular diversity of Mycobacterium tuberculosis isolates from patients with
Mycobacterium tuberculosis complex genetic diversity: mining the fourth interna-
pulmonary tuberculosis in Mozambique. BMC Microbiol 10:195.
tional spoligotyping database (SpolDB4) for classification, population genetics
28. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference
and epidemiology. BMC Microbiol 6: 23.
under mixed models. Bioinformatics 19: 1572–1574.
7. Rastogi N, Sola C (2007) Molecular evolution of the Mycobacterium tuberculosis
complex. In: Palomino JC, Leao S, Ritacco V, editors. Tuberculosis 2007: from 29. Alexander DC, Jones JR, Liu J (2003) A rifampin-hypersensitive mutant reveals
basic science to patient care. 53–91. Amedeo Online Textbooks: http://www. differences between strains of Mycobacterium smegmatis and presence of a novel
tuberculosistextbook.com/index.htm, Accessed 15 March 2012. transposon, IS1623. Antimicrob Agents Chemother 47: 3208–3213.
8. Sreevatsan S, Pan X, Stockbauer KE, Connell ND, Kreiswirth BN, et al. (1997) 30. Caimi K, Romano MI, Alito A, Zumarraga M, Bigi F, et al. (2001) Sequence
Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex analysis of the direct repeat region in Mycobacterium bovis. J Clin Microbiol 39:
indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A 1067–1072.
94: 9869–9874. 31. Schürch AC, Kremer K, Kiers A, Boeree MJ, Siezen RJ, et al. (2011)
9. Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, et al. (2002) A Preferential deletion events in the direct repeat locus of Mycobacterium tuberculosis.
new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl J Clin Microbiol 49: 1318–1322.
Acad Sci U S A 99: 3684–3689. 32. Lari N, Rindi L, Garzelli C (2001) Identification of one insertion site of IS6110
10. Fenner L, Malla B, Ninet B, Dubuis O, Stucki D, et al. (2011) ‘‘Pseudo-Beijing’’: in Mycobacterium tuberculosis H37Ra and analysis of the RvD2 deletion in M.
evidence for convergent evolution in the direct repeat region of Mycobacterium tuberculosis clinical isolates. J Med Microbiol. 50: 805–811.
tuberculosis. PLoS One 6: e24737. 33. Sampson SL, Warren RM, Richardson M, Victor TC, Jordaan AM, et al. (2003)
11. Comas I, Homolka S, Niemann S, Gagneux S (2009) Genotyping of genetically IS6110-mediated deletion polymorphism in the direct repeat region of clinical
monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights isolates of Mycobacterium tuberculosis. J Bacteriol 185: 2856–2866.
the limitations of current methodologies. PLoS One 4: e7815. 34. Brodin P, Eiglmeier K, Marmiesse M, Billault A, Garnier T, et al. (2002)
12. Allix-Béguec C, Harmsen D, Weniger T, Supply P, Niemann S (2008) Bacterial artificial chromosome-based comparative genomic analysis identifies
Evaluation and strategy for use of MIRU-VNTRplus, a multifunctional Mycobacterium microti as a natural ESAT-6 deletion mutant. Infect Immun 70:
database for online analysis of genotyping data and phylogenetic identification 5568–5578.
of Mycobacterium tuberculosis complex isolates. J Clin Microbiol. 46: 2692–2699. 35. Rao KR, Kauser F, Srinivas S, Zanetti S, Sechi LA, et al. (2005) Analysis of
13. Kato-Maeda M, Gagneux S, Flores LL, Kim EY, Small PM, et al. (2011) Strain genomic downsizing on the basis of region-of-difference polymorphism profiling
classification of Mycobacterium tuberculosis: congruence between large sequence of Mycobacterium tuberculosis patient isolates reveals geographic partitioning. J Clin
polymorphisms and spoligotypes. Int J Tuberc Lung Dis 15: 131–133. Microbiol 43: 5978–5982.
14. Abadia E, Zhang J, dos Vultos T, Ritacco V, Kremer K, et al. (2010) Resolving 36. Tsolaki AG, Gagneux S, Pym AS, Goguet de la Salmoniere YO, Kreiswirth BN,
lineage assignation on Mycobacterium tuberculosis clinical isolates classified by et al. (2005) Genomic deletions classify the Beijing/W strains as a distinct genetic
spoligotyping with a new high-throughput 3R SNPs based method. Infect Genet lineage of Mycobacterium tuberculosis. J Clin Microbiol 43: 3185–3191.
Evol 10: 1066–1074. 37. Fang Z, Morrison N, Watt B, Doig C, Forbes KJ (1998) IS6110 transposition
15. van der Zanden AG, Kremer K, Schouls LM, Caimi K, Cataldi A, et al. (2002) and evolutionary scenario of the direct repeat locus in a group of closely related
Improvement of differentiation and interpretability of spoligotyping for Mycobacterium tuberculosis strains. J Bacteriol 180: 2102–2109.
Mycobacterium tuberculosis complex isolates by introduction of new spacer 38. Jansen R, Embden JD, Gaastra W, Schouls LM (2002) Identification of genes
oligonucleotides. J Clin Microbiol. 40: 4628–4639. that are associated with DNA repeats in prokaryotes. Mol Microbiol 43: 1565–
16. Brudey K, Gutierrez MC, Vincent V, Parsons LM, Salfinger M, et al. (2004) 1575.
Mycobacterium africanum genotyping using novel spacer oligonucleotides in the 39. Renfrew C (2010) Archaeogenetics – towards a ‘new synthesis’? Curr Biol 20:
direct repeat locus. J Clin Microbiol 42: 5053–5057. R162–165.
17. van Embden JD, van Gorkom T, Kremer K, Jansen R, van Der Zeijst BA, et al. 40. Underhill PA, Kivisild T (2007) Use of y chromosome and mitochondrial DNA
(2000) Genetic variation and evolutionary origin of the direct repeat locus of
population structure in tracing human migrations. Annu Rev Genet 41: 539–
Mycobacterium tuberculosis complex bacteria. J Bacteriol 182: 2393–2401.
564.
18. Filliol I, Sola C, Rastogi N (2000) Detection of a previously unamplified spacer
41. Kayser M (2010) The human genetic history of Oceania: near and remote views
within the DR locus of Mycobacterium tuberculosis: epidemiological implications. J
of dispersal. Curr Biol 20: R194–201.
Clin Microbiol. 38: 1231–1234.
42. Majumder PP (2010) The human genetic history of South Asia. Curr Biol 20:
19. Legrand E, Filliol I, Sola C, Rastogi N (2001) Use of spoligotyping to study the
evolution of the direct repeat locus by IS6110 transposition in Mycobacterium R184–187.
tuberculosis. J Clin Microbiol 39: 1595–1599. 43. Mokrousov I, Ly HM, Otten T, Lan NN, Vyshnevskyi B, et al. (2005) Origin
20. Roberts DE, Ascherman D, Kleckner N (1991) IS10 promotes adjacent deletions and primary dispersal of the Mycobacterium tuberculosis Beijing genotype: clues
at low frequency. Genetics 128: 37–43. from human phylogeography. Genome Res 15: 1357–1364.
44. Myres NM, Rootsi S, Lin AA, Järve M, King RJ, et al. (2011) A major Y- 47. Mokrousov I, Valcheva V, Sovhozova N, Aldashev A, Rastogi N, et al. (2009)
chromosome haplogroup R1b Holocene era founder effect in Central and Penitentiary population of Mycobacterium tuberculosis in Kyrgyzstan: exceptionally
Western Europe. Eur J Hum Genet 19: 95–101. high prevalence of the Beijing genotype and its Russia-specific subtype. Infect
45. Sola C, Ferdinand S, Sechi LA, Zanetti S, Martial D, et al. (2005) Mycobacterium Genet Evol 9: 1400–1405.
tuberculosis molecular evolution in western Mediterranean Islands of Sicily and 48. Filliol I, Motiwala AS, Cavatore M, Qi W, Hazbón MH, Bobadilla del Valle M,
Sardinia. Infect Genet Evol 5: 145–156. et al. (2006) Global phylogeny of Mycobacterium tuberculosis based on single
46. Gibson AL, Huard RC, Gey van Pittius NC, Lazzarini LC, Driscoll J, et al. nucleotide polymorphism (SNP) analysis: insights into tuberculosis evolution,
(2008) Application of sensitive and specific molecular methods to uncover global phylogenetic accuracy of other DNA fingerprinting systems, and recommenda-
dissemination of the major RDRio Sublineage of the Latin American- tions for a minimal standard SNP set. J Bacteriol 188: 759–772.
Mediterranean Mycobacterium tuberculosis spoligotype family. J Clin Microbiol
46: 1259–1267.